28

I have been browsing this site for the answer but I'm still a little unsure how to plan a similar system in its database structure and implementation.

In PHP and MySQL it would be clear that some achievements are earned immediately (when a specialized action is taken, in SO case: Filled out all profile fields), although I know SO updates and assigns badges after a certain amount of time. With so many users & badges wouldn't this create performance problems (in terms of scale: high number of both users & badges).

So the database structure I assume would something as simple as:

Badges     |    Badges_User      |    User
----------------------------------------------
bd_id      |    bd_id            |  user_id
bd_name    |    user_id          |  etc
bd_desc    |    assigned(bool)   |  
           |    assigned_at      |

But as some people have said it would be better to have an incremental style approach so a user who has 1,000,000 forum posts wont slow any function down.

Would it then be another table for badges that could be incremental or just a 'progress' field in the badges_user table above?

Thanks for reading and please focus on the scalability of the desired system (like SO thousands of users and 20 to 40 badges).

EDIT: to some iron out some confusion I had assigned_at as a Date/Time, the criteria for awarding the badge would be best placed inside prepared queries/functions for each badge wouldn't it? (better flexibility)

bluedaniel
  • 2,027
  • 5
  • 31
  • 47
  • are some more popular badges checked more than others as well? – bluedaniel Nov 16 '09 at 20:49
  • 3
    Could you clarify the need for "assigned(bool)"? Isn't it sort of redundant as you wouldn't have a mapping in the first case unless you are assigned the badge? And why would the number of forum posts matter? – Fredrik Nov 16 '09 at 20:50
  • my first mistake! say you were assigning badges depending on the amount of posts made i.e rewarding high posts with good badges -> incentive to get users to interact – bluedaniel Nov 16 '09 at 20:53
  • 1
    @Fredrik, agreed on "assigned", but "assigned_at" means you can track stuff like the 1,000,000 forum posts badge mentioned. He's basically tracking two kinds of badges in the one table. – da5id Nov 16 '09 at 20:54
  • exactly, thats gotta be a bad idea hasnt it? – bluedaniel Nov 16 '09 at 20:55
  • pedantically praps, but i think it quite clever :) – da5id Nov 16 '09 at 20:56
  • @da5id: I still don't see why the "assigned_at" should make it any slower with a million posts unless he creates a record per post. But that would be a quite incorrect way of solving the problem. Keeping track of the number of posts (per area, topic, whatever) is just a matter of having counters somewhere, they will be equally expensive with one post as with a million posts. – Fredrik Nov 16 '09 at 21:03
  • Addition to my last post, that assigned_at could be the counter but that would basically mean that all users get a badge mapping even if they have not earned it yet and that would scale quite bad with a lot of users and/or a lot of badges. – Fredrik Nov 16 '09 at 21:04
  • @Fredrik, that's not what I meant, sorry for the confusion - see my answer below. – da5id Nov 16 '09 at 21:06
  • @da5id:Gave you +1 for it but I think bluedaniel would benefit from a better description of tables, when they are updated and with what... – Fredrik Nov 16 '09 at 21:22
  • @Fredfrick im unsure how the counter would be just as expensive, having to count all the posts by each user to see if they have earned the '1,000,000 badge' would be more expnesive than checking the counter wouldnt it? – bluedaniel Nov 16 '09 at 21:28
  • And im unsure what to worry about more, size vs performance. ie big counting queries or lots of counters – bluedaniel Nov 16 '09 at 21:29
  • @bluedaniel: There is no need to recount every post every time if you keep track of the number of qualifying posts. Each new post means that you spend a small amount of CPU increasing the counter. It doesn't matter if it is the second or millionth time. – Fredrik Nov 16 '09 at 21:56
  • Sorry, off to a meeting or I'd expand my answer, but basically the idea is to not have to worry about either: Every time a user submits something to the site you increment the Submissions_User table in my answer - immediately after that you run the "event listener" and award a badge if applicable. – da5id Nov 16 '09 at 21:57
  • ...just remember to put indexes on the tables! – da5id Nov 16 '09 at 21:59
  • @Fredfrick exactly so would it be fitting to have two tables, one for incremental badges and one for 'one off' achievements? also what about SO's style of running these queries to assign badges. each function would surely [check if they have the badge already] -> [check if they qualify] -> [assign the badge] but would they run for each user and each badge every few hours? good way or not? – bluedaniel Nov 16 '09 at 22:14
  • @bludanniel: I'm not going to design it for you but it really isn't as hard to do as you make it sound. If a "one off" badge cold be "first post to be a favorite", just keep track of that and give the user the badge if it happens. If there is a badge assigned the first time a user gets 10 upvotes/post, keep track of the maximum number of upvotes/post so far. There is no need to check if the badge is assigned already if you only assign it when you reach a specific limit with a counter. It is late over here and I need to get up really early tomorrow so I can't help you more. – Fredrik Nov 16 '09 at 22:52

4 Answers4

7

regarding the sketch you included: get rid of the boolean column on badges_user. it makes no sense there: that relation is defined in terms of the predicate "user user_id earned the badge bd_id at assigned_at".

as for your overall question: define the schema to be relational without regard for speed first (that'll get you rid of half of potential perf. problems, possibly in exchange for different perf. problems), index it properly (what's proper depends on the query patterns), then if it's slow, derive a (still relational) design from that that's faster. like you may need to have some aggregates precomputed, etc.

just somebody
  • 16,800
  • 5
  • 47
  • 58
7

I think the structure you've suggested (without the "assigned" field as per the comments) would work, with the addition of an additional table, say "Submissions_User", containing a reference to user_id & an incrementing field for counting submissions. Then all you'd need is an "event listener" as per this post and methinks you'd be set.

EDIT: For the achievement badges, run the event listener upon each submission (only for the user making the submission of course), and award any relevant badge on the spot. For the time-based badges, I would run a CRON job each night. Loop through the complete user list once and award badges as applicable.

Community
  • 1
  • 1
da5id
  • 8,890
  • 8
  • 36
  • 52
  • Ok this is the best method, and what about in terms of implementation, as in my question SO awards badges after a set time, so every few hours and is it then [foreach(badge)] -> do all users or [foreach(user)] - > do all badges, does this make a difference? I could then set badges which are less likely to be awarded at longer intervals perhaps? thoughts.. and Ill accept as an answer – bluedaniel Nov 17 '09 at 14:17
7

I would keep a similar type structure to what you have

Badges(badge_id, badge_name, badge_desc)
Users(user_id, etc)
UserBadges(badge_id, user_id, date_awarded)

And then add tracking table(s) depending on what you want to track and @ what detail level... then you can update the table accordingly and set triggers on it to "award" the badges

User_Activity(user_id, posts, upvotes, downvotes, etc...)

You can also track stats from the other direction too and trigger badge awards

Posts(post_id, user_id, upvotes, downvotes, etc...)


Some other good points are made here
Community
  • 1
  • 1
CheeseConQueso
  • 5,493
  • 25
  • 84
  • 126
4

I think this is one of those cases where your many-to-many table (Badges_User) is appropriate.
But with a small alteration so that unassigned badges isn't stored.

I assume assigned_at is a date and/or time.
Default is that the user does not have the badges.

Badges     |    Badges_User      |  User
----------------------------------------------
bd_id      |    bd_id            |  user_id
bd_name    |    user_id          |  etc
bd_desc    |    assigned_at      |  
           |                     |

This way only badges actually awarded is stored.
A Badges_User row is only created when a user gets a badge.

Regards
    Sigersted

Sigersted
  • 346
  • 3
  • 6