3

As far as I was aware, for MS SQL, PostgreSQL, and even MySQL databases (so, I assumed, in general for RDBMS engines), you cannot simply back up the file system they are hosted on, but need to do an SQL-level backup to have any hope of internal consistency and therefore ability to actually restore.

But then answers like this and indeed the official docs referenced seem to suggest that one can just tar away on database data:

docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata

These two ideas seem at odds with one another. Is there something special about how Docker works that makes it unnecessary to use SQL-level backups? If not, what am I missing in my understanding? (Why is something used as the official example when you can't use it to back up a production database? That can't be right...)

Community
  • 1
  • 1
Kev
  • 14,115
  • 14
  • 75
  • 106

1 Answers1

4

Under certain circumstances, it should be safe to use the image of a database on a disk:

  • The database server is not running.
  • All persistent data is on the disk system(s) being backed up (logs, tables spaces, temporary storage).
  • All components are restored together.
  • You are restoring the image to the same server on the same path.

The last condition is important, because some aspects of the database configuration may be stored in operating system files.

You need to do the backup within the database whenever the server is running. The server is responsible for the internal consistency of the data, and the disk image may not be complete or recoverable. If the server is not running, then the state of the database should be consistent in the persistent storage.

Gordon Linoff
  • 1,122,135
  • 50
  • 484
  • 624
  • OK, that's what I thought. So I'm still a bit confused, do professionals normally shut down their databases to take a backup? (Seems unlikely to me...) Or do the Docker docs assume you use multiple containers that replicate each other so that you can shut down one to take a backup this way? (So my current way of doing things is then abnormally unsophisticated, taking just a hot backup of a lone server.) – Kev Jun 25 '15 at 12:42
  • 1
    @Kev: SQL dumps and offline file copying are at the extreme ends of the backup spectrum, but [Postgres](http://www.postgresql.org/docs/9.4/static/backup.html), [MySQL](https://dev.mysql.com/doc/refman/5.5/en/backup-and-recovery.html) and [MSSQL](https://msdn.microsoft.com/en-us/library/ms187048.aspx) all provide a wide variety of alternatives, many of which will work with a live database. – Nick Barnes Jun 25 '15 at 15:20
  • Of particular relevance to your question is [this page](http://www.postgresql.org/docs/9.4/static/backup-file.html), which mentions that you can safely copy a live Postgres instance if your file system supports snapshots. I'm not familiar with Docker, but it [looks like](http://stackoverflow.com/questions/20813486/exploring-docker-containers-file-system) that may be the case here. – Nick Barnes Jun 25 '15 at 15:21
  • @NickBarnes, right, I used to use VSS for MSSQL backups. Postgres in the link you provided I do not see a "wide variety" of though, I see WAL which is essentially the equivalent of MSSQL replication in one of its modes. I guess replication then is in the middle of the spectrum to which you were referring, but in terms of redundancy, flexibility, and setup sophistication, it's actually higher than either 'end'. (In general I would also group hot backups, table dumps, csv dumps I think all have about the same effect, if you do them atomically.) – Kev Jun 25 '15 at 15:26