48

There is the "put a FK to your parent" method, i.e. each records points to it's parent.
Which is a hard for read actions but very easy to maintain.

And then there is a "directory structure key" method:

0001.0000.0000.0000 main branch 1
0001.0001.0000.0000 child of main branch one
etc

Which is super easy to read, but hard to maintain.
What are the other ways and their cons/pros?

Itay Moav -Malimovka
  • 48,785
  • 58
  • 182
  • 262
  • 2
    See http://stackoverflow.com/questions/846201/fast-relational-method-of-storing-tree-data-for-instance-threaded-comments-on-ar for nested sets, key naming schemes and other approaches to squeezing hierarchies into relations. – Constantin Jul 29 '10 at 13:20
  • @Constantin isn't that the second way I showed here? – Itay Moav -Malimovka Jul 29 '10 at 16:01
  • We are using the old-age method of referring to the parent using FK for storing hierarchical data and are almost happy. To load lots of data, we make an XML query and deserialise to objects. – Kangkan Jul 29 '10 at 13:02
  • 1
    On first blush, tree structures and RDBMs are a [horrible fit](http://dev.mysql.com/tech-resources/articles/hierarchical-data.html). If I've ever seen a use case for [Structured Storage](http://en.wikipedia.org/wiki/Nosql), this is it. – msw Jul 29 '10 at 13:04
  • 2
    +1 that "horrible fit" link gives examples of how to do it using 2 structures and some suggested reading. – phkahler Jul 29 '10 at 13:14
  • "Best" implementation depends on data structure and read/write characteristics. For a near-comprehensive community-updated list of options to consider what works best for you: https://stackoverflow.com/questions/4048151 – Kache Jun 17 '18 at 19:00

4 Answers4

59

As always: there is no best solution. Each solution makes different things easier or harder. The right solution for you depends on which operation you will do most.

Naive Approach with parent-id:

Pros:

  • Easy to implement

  • Easy to move a big subtree to another parent

  • Insert is cheap

  • Needed Fields directly accessible in SQL

Cons:

  • Retrieving a whole tree is recursive and therefore expensive

  • Finding all parents is expensive too ( SQL doesn't know recursions... )

Modified Preorder Tree Traversal ( saving a start- & end-point) :

Pros:

  • Retrieving a whole tree is easy and cheap

  • Finding all parents is cheap

  • Needed Fields directly accessible in SQL

  • Bonus: you're saving the order of childnodes within its parentnode too

Cons:

  • Inserting / Updating can be very expensive, as you'll maybe have to update a lot of nodes

Saving a path in each Node:

Pros:

  • Finding all parents is cheap

  • Retrieving a whole tree is cheap

  • Inserting is cheap

Cons:

  • Moving a whole tree is expensive

  • Depending on the way you save the path, you won't be able to work with it directly in SQL, so you'll always need to fetch & parse it, if you want to change it.

Closure table

Pros:

  • Easy to implement

  • Finding all parents is cheap

  • Inserting is cheap

  • Retrieving whole trees is cheap

Cons:

  • Needs an additional table

  • Takes up a lot of space compared to other approaches

  • Moving a subtree is expensive

I'd prefer one of the last two, depending on how often the data changes.

See also: http://media.pragprog.com/titles/bksqla/trees.pdf

yes sure
  • 97
  • 1
  • 11
Baju
  • 2,264
  • 1
  • 15
  • 24
  • I agree with you: there's no best solution at all. Anyway, there are several different ways to represent Tree Data Structure, and several kinds of techniques to fit those representations in a relational model. Normally, I prefer to distinguish two cases: 1. read-intensive and 2. write-intensive. According to the specific case, one model fits bettern than another: for example, if write-intensive, writing a lot of pointers on a denormalized table can be quite expensive, degrading the performances. – Paolo Maresca Apr 27 '14 at 10:30
  • For the third option use "Materialized Path" in your favorite search engine to find out more. Took me a while to find a name for it. The second option is also called "Nested Set". – Fredrik Boström Mar 17 '15 at 11:39
21

Modified Preorder Tree Traversal

This is a method which uses a non-recursive function (usually a single line of SQL) to retrieve trees from the database, at the cost of being a little trickier to update.

Diagram showing numbered hierarchical tree**

Section 2 of the Sitepoint article Storing Hierarchical Data in a Database for more detail.

SLCH000
  • 1,055
  • 1
  • 11
  • 14
TRiG
  • 9,249
  • 6
  • 50
  • 101
  • 2
    That article conveniently ignores that the tree must perhaps be updated later on, and would thus require that a lot of nodes be renumbered to keep it consistent. – Lasse V. Karlsen Jul 29 '10 at 13:59
  • 5
    The article doesn't ignore that at all. It explicitly says on page 3 that you have to update the node numbers and gives the SQL to do that. – siride Jul 29 '10 at 14:22
  • 2
    It should be noted that this model is great when you have considerably more reads than writes - if you're expecting large number of writes (e.g. perhaps 1 write for every 4 reads) this model can be problematic and something like a simple parent ID (and more expensive reads) may be preferable. – mindplay.dk Jul 15 '14 at 17:55
  • `Modified Preorder Tree Traversal` is a beautiful way to solve this problem, in case not modify the tree structure frequently. – Bin May 19 '17 at 06:32
5

I'd say the "golden way" to store a hierarchical data structure is to use a hierarchical database. Such as, for instance, HDB. That's a relational database that handles trees quite well. If you want something more powerful, LDAP might do for you.

A SQL database is ill-suited to this abstract topology.

Borealid
  • 86,367
  • 8
  • 101
  • 120
0

I don't think it's hard to build a tree structure with a relational database.

However, an object oriented database would work much better for this purpose.

Using an object oriented database:

parent has a set of child1  
child1 has a set of child2  
child2 has a set of child3  
...  
...

In an object oriented database you can build this structure pretty easily.

In a relational database, you will have to maintain foreign keys to parent.

parent  
id  
name  

child1  
parent_fk  
id  
name 

child2  
parent_fk  
id  
name  

..

Essentially, while you are building your tree structure you will have to join all these tables, or you can iterate through them.

foreach(parent in parents){
   foreach(child1 in parent.child1s)
    foreach(child2 in child1.child2s)

...
Matt Fenwick
  • 44,546
  • 19
  • 115
  • 184
DarthVader
  • 46,241
  • 67
  • 190
  • 289
  • Iteratin the database in such a way can be very expensive, mainly if you have to retrieve only a subtree from a large tree structure. – GergelyPolonkai May 01 '12 at 14:26