Renormalizing when fields depend on non candidate keys but usage would suggest to not split the table

Question

I'll use a simple example from the financial sector to illustrate the issue. Specifically, a table that describes financial instruments (only stocks, futures and options).

I'm going to simplify the table to make the example as small and easy as possible (i.e. it's not realistic).

'Table v1.0' columns: name, term, type.

'name' would be one of stock, future or option.

'term' is a date. This is always Null for stocks, as it actually only applies to the other two.

'type' is Put or Call for options and Null for the others.

Note that 'name' is not a candidate key (it would be for stock, but not for futures and options). 'term' depends on 'name' (it's Null for stock), 'type' depends on 'name' and 'term' (as it only applies to options).

This is definitely not 3rdN as far as I can tell.

'Table v2.0' columns: name, term.

'name' would be one of stock, future, call or put.

'term' is the same as in 1.0.

This respects 1stN only because I shortened 'Call Option' and 'Put Option', and still has the 3rdN problem on 'term'.

Apparently the specifications of these instruments are incompatible and I should have a table for each of them (even though the stock one would have only 1 entry). This would be annoying as other tables would use the row id from this table as foreign key to link information about, say, a trade. If I split in 3 tables, I would need a 4th in order to detect which of the 3 to access to link trade and instrument.

Would it be that bad to stick with design 1.0 (considering that data correctness for this table is already guaranteed before insertion)? Is there a pattern to use in these cases that avoids having a table for each kind of instrument?

philipxy · Accepted Answer · 2019-12-15T00:01:19.423

Normalization does not treat NULL specially. But SQL does. Its NATURAL JOIN, = & other operators are not the operators by the same name as relational theory or arithmetic. Also its use of other terms differs, like PK (primary key) & UNIQUE. When a SQL table has all CKs (candidate keys) with nullable columns, it may be decomposable--reconstructable by natural join from components--but in SQL that means recombining by INNER JOIN with an ON involving shared columns being = or NULL. Also PK & UNIQUE constraints treat NULLs specially in a way that cannot enforce CKs, superkeys or uniqueness in its normal sense. Similarly for FKs (foreign keys) & referential integrity.

You are using terms but you are not clear about their meanings & don't appear to be applying them correctly. You may see this if you work through all your steps to justify your claims about FDs (functional dependencies), CKs & NFs (normal forms) by referencing definitions. Eg you say "depends on" but it's not in the normalization-relevant sense of "is functionally dependent on".

What to do with null values when modeling and normalizing?
Does an empty SQL table have a superkey? Does every SQL table?
"1NF" has no single meaning.

Ideally, design without nulls, then combine tables on null-free CKs via LEFT JOINs to introduce nulls.

Information modeling methods tend to produce designs that don't have such null problems. Time to follow a published academic textbook on information modelling, the relational model & DB design & querying.

Anyway you specifically have a case of subtyping. Cases of that general notion get called things like subtyping, inheritance & polymorphism. There are database idioms for that. Of course, they involve appropriate judicious tables, CKs, non-CK nullable columns & reconstruction of a supertype table via LEFT JOIN.

How can you represent inheritance in a database?
How do you effectively model inheritance in a database?

You may be using EAV (Entity-Attribute-Value) design inappropriately. It's a common anti-pattern for representing subtyping. Eg you say a table "would have only 1 entry" here, which suggests it is effectively metadata & likely belongs in the DBMS-managed metadata as a consequence of appropriate tables & constraints having been declared. (But you don't give enough detail re your application or your design for us to know.)

Designing an EAV database correctly for historical data

Great answer, thansk. That table is just a bunch of metadata that effectively represents the equivalent of a 'person full name', but it is useful to have split in its component as sometimes I filter on those. No inheritance going on, it's just that their fields sort of match as if it was. If I split as per inheritance links my main concern is 'convenience' as other tables can have entries that can be either 'stock' or 'future' or 'option', so if I want the 'full name' I might have to join each of the 3 row subsets on a different table, and have a column to identify which one for a given row. — Dirich, Dec 15 '19 at 11:17
Those links' "inheritance" is just subtyping, as in your subtypes of instrument. But I really don't know how they fit into your application/design. You call it metadata but I suspect it is really just data: Metadata is properly data re the *tables* independent of current contents & application; colour being a property/attribute of a car can be taken to imply that there are different "types" of car--one per colour--but colour is still just car data--but when 1:1 with tables it's *like* metadata & logically redundant. Re next post, maybe ask re some tiny designs, just entities with your "types". — philipxy, Dec 15 '19 at 11:56

Renormalizing when fields depend on non candidate keys but usage would suggest to not split the table

1 Answers1