Can't decide whether normalization or de-normalization would work

Question

this question may be a bit broad!

Recently I am learning MySQL. I am making a school exam information database. I have created a table called subjects where information regarding various subjects are stored including full name or full marks for each subject. I have also created another table called exams where marks obtained by each student on each subject is stored. Obviously subjects is the master table and exams is the child table here. Based on the data stored in these 2 tables, when joined together, information like percentage and grade for each student is generated.

But what if, the certain information in the subjects table changes in future. Say full marks get changed for a particular subject. It that case the older records in the exams table will be invalid or wrong because now the join will produce new values which are correct for current records but wrong for older records.

What should I do in this kind of situation? Do I denormalize both tables into a complete table? But my timid knowledge in database design says that is wrong practice!

Any help or insight will be highly appreciated.

from what i hear you have 3 tables so far not 2. students, subjects, exams. Keep the stuff normalized and have intersect tables. See [Junction tables](http://stackoverflow.com/a/32620163) — Drew, Oct 24 '16 at 06:10
Search on "historical data". Eg http://stackoverflow.com/questions/3874199/how-to-store-historical-data?s=2|4.9491 — philipxy, Oct 24 '16 at 09:16

score 0 · Accepted Answer · answered Oct 24 '16 at 06:12

0

You can add a new row to the subjects-table instead of changing an existing one when a subject changes. So, the subject - table contains versions of subjects. This can be combined with using validity dates. You will need a normalization step after applying the approach. This might be what is meant by one of the comments.

answered Oct 24 '16 at 06:12

mm759

1,350
1
7
7

Thanks for the solution. Never thought that. But I was thinking, wouldn't that mean repetition of same data in the `subjects` table? Because apart from the full marks, other column data like subject name and subject type will be repeated many times..... or maybe that doesn't matter. – Tamonash Gupta Oct 24 '16 at 08:23
@TamonashGupta Re "wouldn't that mean repetition": Yes. If that's a *demonstrated problem*, see my answer. – philipxy Oct 24 '16 at 08:55
This is the reason why I added the following sentence to the answer: "You will need a normalization step after applying the approach." So, you can split subjects into two tables. – mm759 Oct 25 '16 at 09:45

score 0 · Answer 2 · edited May 23 '17 at 10:32

You need to model your historical situation(s), just as you modeled your current situation.

You may or may not choose to denormalize past data. But the most benefit per effort actually involves having historical situation tables look like or actually be the current situation tables. This involves investing in further normalizing current rows into subrows that you then extend by a date so that you can join together subrows from the same date.

If duplicated data is/becomes a demonstrated burden then you can split up a table into multiple tables, for subrows that you want to date the most recent change to while other subrows remain dated as of an earlier change. Then you can join up rows that are the most recent that agree as of a given date. In some "temporal" databases we minimize redundant data by splitting rows into subrows and not just dating them but labeling them with a date range during which they were current.

From a recent answer of mine:

Proposals to hard delete assume that you keep desired historical data. Do not limit your thinking about achieving this to merely nulling FKs, cascading, adding a flag/date column to an extant table or anything else. Properly model both present & past including database changes that need to occur as a DBMS transaction upon each chosen application situation change. Proposals to soft-delete just involve putting certain current and historical data into the same table vs different ones. This only works for very simple models of current & historical situations.

It is usually straightforward to design a database for only a current application situation. But if we do care about the past we typically only care about some of it. If so, upon certain application situation changes from current to past we can copy a snapshot of the relevant current state into historical state. Labeling data with soft-delete flags vs dates is the combined-table version of undated vs dated historical data, where we only care about current vs past situations and we only care that vs when a change occurred.

"Temporal" databases more or less record the current situation and a bunch of dated once-current situations. This recording of past data using the structure for current data simplifies understanding & querying of current & past data. (The querying about intervals of time that a temporal database can facilitate can get quite complicated.) But it turns out that making a temporal version of a given current-data design does not just involve adding date columns to extant current-data tables. It requires remodeling current data, breaking it into smaller tables with more constraints. This is because different kinds of application situation changes require dating different column combinations of the extant current-data design. (Hard and soft historical snapshot designs must address this, but for a limited past/history.)

Thanks @philipxy for the insight. Designing and programming historical data is likely to be over complicated in my case. So as you mentioned somewhere that "only very simple cases can get away with some bit(s) being active or not", I will simply flag my records for now. — Tamonash Gupta, Oct 24 '16 at 14:14
On a second thought, can you please give an example how to model historical situation as per your proposal. I would like to learn this. — Tamonash Gupta, Oct 24 '16 at 14:19
My proposal is to decide on what current and past stuff you want to record. Your question basically says, 'if I do X I won't get what I want'. So don't do X. The way to find what to do to get what you want is to *design*. For now: Play with current data student(id, name, addr). Make a sequence of changes in the application situation. When you go to ask about the past and you can't get what you want, *remodel the current & past*. Try an is-current flag, a version #, a date and/or copying (sub)rows. Add class(id, name) & exam(sid,cid, mark) and repeat. — philipxy, Oct 24 '16 at 20:19
OK, I am trying. I am currently using an `isactive` column for marking old and new records in my `subjects` table. The corresponding program will look for the records where `isactive = true` and thus the new records will be identified separately. — Tamonash Gupta, Oct 25 '16 at 13:32

Can't decide whether normalization or de-normalization would work

2 Answers2

Linked