6

I have some innoDbs with only 2 int columns which are foreign keys to the primary keys of other tables.

E.g one table is user_items, it has 2 columns, userId, itemId, both foreign keys to user and item tables, set to cascade if updated or deleted.

Should I add a 3rd column to such tables and make it a primary key, or is it better the way it is right now, in terms of performance or any other benefits?

Click Upvote
  • 235,452
  • 251
  • 553
  • 736
  • In this case it sounds like you should stick to using your natural key as your primary key. Using a surrogate key as your primary key is useful when you have a complex natural key, but in this case I don't think there's much to be gained here. – ta.speot.is Jun 24 '12 at 10:01

3 Answers3

6

Adding a third ID column just for the sake of adding an ID column makes no sense. In fact it simply adds processing overhead (index maintenance) when you insert or delete rows.

A primary key is not necessarily "an ID column".

If you only allow a single associated between user and item (a user cannot be assigned the same item twice) then it does make sense to define (userid, itemid) as the primary key of your table.

If you do allow the same pair to appear more than once then of course you don't need that constraint.

a_horse_with_no_name
  • 440,273
  • 77
  • 685
  • 758
  • What's the use of defining (userId, itemId) as the primary key? – Click Upvote Jun 24 '12 at 10:19
  • @ClickUpvote: as I said: if there is a requirement that a user can only be assigned once to an item (which I don't know - only you know) you want to ensure this requirement is not violated by defining that combination as a primary key (= unique constraint) – a_horse_with_no_name Jun 24 '12 at 10:21
  • 1
    @ClickUpvote And if there is a requirement for multiple associations between same user and item, you'd need a field for distinguishing between them anyway, and you'd just add _it_ to PK. In either case, you can't just leave the table PK-less. Even if you tried, InnoDB will create a hidden PK of you, since it must have _some_ PK on which to cluster the data. – Branko Dimitrijevic Jun 24 '12 at 11:22
  • There is a requirement that a user can only be assigned to one item, however this is enforced in the code, i.e before inserting a row to this table its checked to make sure that the user isnt already associated to that item. So I don't think I need the primary key for enforcing this rule. Apart from this, is there any other benefit of adding this primary key on userId and itemId? – Click Upvote Jun 24 '12 at 14:06
  • 1
    @ClickUpvote: don't rely on the code, make sure you support this requirement with a primary key in the database. Constraints that are only enforced in the application layer will be violated eventually e.g. because of bugs in the application or faulty SQL scripts. It **will** happen. – a_horse_with_no_name Jun 24 '12 at 14:15
  • 1
    @ClickUpvote Are you sure you implemented it correctly? Do you lock the data correctly, so **concurrent** insertions don't allow duplicates in? Implementing a PK is actually less trivial than it might seem. Also, even if you do (implement it correctly) now, will the evolution of the application keep it that way forever? Will **all** applications in the future be correct? As a general principle, prefer DBMS-level to application-level integrity whenever you can! – Branko Dimitrijevic Jun 24 '12 at 14:18
2

You already have a natural key {userId, itemId}. Unless there is a specific reason to add another (surrogate) key, just use your existing key as primary.

Some reasons for the surrogate may include:

  • Keeping child FKs "slimmer".
  • Elimination of child cascading updates.
  • ORM-friendliness.

I don't think that any of this applies to your case.

Also, please be aware that InnoDB tables are clustered, and secondary indexes in clustered tables are more expensive than secondary indexes in heap-based tables. So ideally, you should avoid secondary indexes whenever you can.

Branko Dimitrijevic
  • 47,349
  • 10
  • 80
  • 152
0

In general, if it adds no real complexity to the code you're writing and the table is expected to contain 100,000-500,000 rows or less, I'd recommend adding the primary key. I also sometimes recommended adding created_at and updated_at columns.

Yes, they require more storage -- but it's minimal. There's also the issue that the primary key index will have to be maintained and so inserts and updates may be slower if the table becomes large. But unless the table is large (100's of thousands or millions of rows) it will probably make no difference in processing speed.

So unless the table is going to be quite large, the space and processing speed impact are insignificant -- so you make the decision on how much effort it takes to maintain it and the potential utility it provides. If it takes very little extra code to do, then virtually any utility it provides might make it worthwhile.

One of the best reasons to have a primary key is to give the rows a natural order based on the order they were inserted. If you ever want to retrieve the last 100 (or first 100) rows added, it's very simple and fast if you have an auto-increment primary key on the table.

Adding inserted_at and updated_at columns can provide similar utility in terms of fetching data based on date ranges. Again, unless the number of rows is going to be very large, it may be worth evaluating these as well.

Kevin Bedell
  • 12,704
  • 9
  • 71
  • 108