91

I'm building a chat app and I want a full history off all messages ever sent in the chat conversation. At the moment I am storing each message as a single row in a table called 'messages'. I am aware that this table could grow huge as even small messages like 'Hi' would have their own database record.

Can anyone recommend a more scalable mysql solution? I don't require the individual messages to be searchable, editable or deletable. Could the whole conversation be stored in one huge field?

Would love to hear your ideas!

wilsonpage
  • 16,275
  • 19
  • 98
  • 144
  • 13
    if these messages don't need to be searchable or editable, there is no point to keep inside the database – ajreal Aug 15 '11 at 08:54
  • 28
    I would advise to start easy, think simple, use a relational database, and if scaling becomes an issue, deal with it! Too many people care about scenarios which will never occur because they spend too much time building the perfect infrastructure and they will not have the time to focus on what's important. – whirlwin Sep 09 '16 at 18:39

3 Answers3

51

There's nothing wrong with saving the whole history in the database, they are prepared for that kind of tasks.

Actually you can find here in Stack Overflow a link to an example schema for a chat: example

If you are still worried for the size, you could apply some optimizations to group messages, like adding a buffer to your application that you only push after some time (like 1 minute or so); that way you would avoid having only 1 line messages

Community
  • 1
  • 1
jasalguero
  • 3,914
  • 2
  • 27
  • 52
14

If you can avoid the need for concurrent writes to a single file, it sounds like you do not need a database to store the chat messages.

Just append the conversation to a text file (1 file per user\conversation). and have a directory/ file structure

Here's a simplified view of the file structure:

chat-1-bob.txt
        201101011029, hi
        201101011030, fine thanks.

chat-1-jen.txt
        201101011030, how are you?
        201101011035, have you spoken to bill recently?

chat-2-bob.txt
        201101021200, hi
        201101021222, about 12:22
chat-2-bill.txt
        201101021201, Hey Bob,
        201101021203, what time do you call this?

You would then only need to store the userid, conversation id (guid ?) & a reference to the file name.

I think you will find it hard to get a more simple scaleable solution.

You can use LOAD_FILE to get the data too see: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

If you have a requirement to rebuild a conversation you will need to put a value (date time) alongside your sent chat message (in the file) to allow you to merge & sort the files, but at this point it is probably a good idea to consider using a database.

Kevin Burton
  • 10,374
  • 2
  • 19
  • 33
  • 2
    This sounds brilliant. Can anyone counter this argument? – Trip Jan 22 '16 at 22:14
  • 95
    Writing to a file is a terrible idea. In most server-side environments or cluster you wouldn't even be to guarantee your 2nd request even ended up on the same server as the file. Writing the file system is extremely slow and I/O bound. Sorry, I can't believe this got so many up votes. – Andy Fusniak Jun 20 '17 at 22:16
  • 8
    sorry I was actually answering the question not making up fictional scenarios. At the moment the messages are persisted to a database, so why would a simple file system write be much slower. Also please read my answer 1 file per user\conversation !!! (on your fictional cluster I have installed a FSA-SAN).The OPs requirement sounds like logging \ audit, to my mind that belongs in a file. – Kevin Burton Jun 21 '17 at 12:46
  • 8
    Writing and reading to a file is resource intensive. I think using a database of any kind should help reduce that resource delay. At the end of the day, databases store those information to file as well (just a bit differently). I think the given idea is fantastic to store archived chats, or chats older than 1 year or so. But nothing beats a simple db here. – Jay Patel - PayPal May 14 '18 at 17:43
  • 1
    The OP's vague requirement was to store ' a full history of all messages ever sent'.... not 'searchable, editable or deletable' this still sounds like simple logging\audit to me. It sounds like you are making assumptions as to how the chat app works, you have invented a need to reduce the reads by indexing the messages so you can justify a database, you can see why such vague questions get closed ;) – Kevin Burton May 15 '18 at 10:38
  • 5
    OP clearly says in database, apart from this terrible idea, this not answer the question – Lyoneel Dec 28 '18 at 17:35
  • In the first line I do show i understand the OP's question. Your response is just an opinion, and does not help anyone, unless you can back it up with a well reasoned argument. If you can clearly explain why it is so terrible I will remove (or improve). – Kevin Burton Dec 31 '18 at 10:23
  • If you grow up in the early days of learning coding, the first thing you would have learned that fine operations are expensive and using them for chat messages is the worst idea anyone can give to you. Never use files for storing chat messages. Storing them in relational database is the ultimate solution. It will grow up. I started a chatting app and it has grown too big with over million messages. MySQL is still holding up so no worries for me atm. You would think of optimizing database once it starts growing. – Dan Jul 01 '20 at 00:29
3

You could create a database for x conversations which contains all messages of these conversations. This would allow you to add a new Database (or server) each time x exceeds. X is the number conversations your infrastructure supports (depending on your hardware,...).

The problem is still, that there may be big conversations (with a lot of messages) on the same database. e.g. you have database A and database B an each stores e.g. 1000 conversations. It my be possible that there are far more "big" conversations on server A than on server B (since this is user created content). You could add a "master" database that contains a lookup, on which database/server the single conversations can be found (or you have a schema to assign a database from hash/modulo or something).

Maybe you can find real world architectures that deal with the same problems (you may not be the first one), and that have already been solved.

Bernhard Kircher
  • 3,892
  • 3
  • 29
  • 38