How to deal with persistent storage (e.g. databases) in Docker

Question

How do people deal with persistent storage for your Docker containers?

I am currently using this approach: build the image, e.g. for PostgreSQL, and then start the container with

docker run --volumes-from c0dbc34fd631 -d app_name/postgres

IMHO, that has the drawback, that I must not ever (by accident) delete container "c0dbc34fd631".

Another idea would be to mount host volumes "-v" into the container, however, the userid within the container does not necessarily match the userid from the host, and then permissions might be messed up.

Note: Instead of --volumes-from 'cryptic_id' you can also use --volumes-from my-data-container where my-data-container is a name you assigned to a data-only container, e.g. docker run --name my-data-container ... (see the accepted answer)

Sorry, I phrased that wrongly, I meant to say: all my future instances from that image depend on that container. If I delete that container by accident, I am in trouble. — juwalter, Sep 17 '13 at 09:50
@AntonStrogonoff - yep, phrasing error - I meant to say: I need to make sure I won't ever delete that (possibly) old container, because then the reference the "persistent" storage would also be gone — juwalter, Dec 15 '13 at 14:27

score 996 · Accepted Answer · edited Jul 25 '18 at 04:47

996

Docker 1.9.0 and above

Use volume API

docker volume create --name hello
docker run -d -v hello:/container/path/for/volume container_image my_command

This means that the data-only container pattern must be abandoned in favour of the new volumes.

Actually the volume API is only a better way to achieve what was the data-container pattern.

If you create a container with a -v volume_name:/container/fs/path Docker will automatically create a named volume for you that can:

Be listed through the docker volume ls
Be identified through the docker volume inspect volume_name
Backed up as a normal directory
Backed up as before through a --volumes-from connection

The new volume API adds a useful command that lets you identify dangling volumes:

docker volume ls -f dangling=true

And then remove it through its name:

docker volume rm <volume name>

As @mpugach underlines in the comments, you can get rid of all the dangling volumes with a nice one-liner:

docker volume rm $(docker volume ls -f dangling=true -q)
# Or using 1.13.x
docker volume prune

Docker 1.8.x and below

The approach that seems to work best for production is to use a data only container.

The data only container is run on a barebones image and actually does nothing except exposing a data volume.

Then you can run any other container to have access to the data container volumes:

docker run --volumes-from data-container some-other-container command-to-execute

Here you can get a good picture of how to arrange the different containers.
Here there is a good insight on how volumes work.

In this blog post there is a good description of the so-called container as volume pattern which clarifies the main point of having data only containers.

Docker documentation has now the DEFINITIVE description of the container as volume/s pattern.

Following is the backup/restore procedure for Docker 1.8.x and below.

BACKUP:

sudo docker run --rm --volumes-from DATA -v $(pwd):/backup busybox tar cvf /backup/backup.tar /data

--rm: remove the container when it exits
--volumes-from DATA: attach to the volumes shared by the DATA container
-v $(pwd):/backup: bind mount the current directory into the container; to write the tar file to
busybox: a small simpler image - good for quick maintenance
tar cvf /backup/backup.tar /data: creates an uncompressed tar file of all the files in the /data directory

RESTORE:

# Create a new data container
$ sudo docker run -v /data -name DATA2 busybox true
# untar the backup files into the new container᾿s data volume
$ sudo docker run --rm --volumes-from DATA2 -v $(pwd):/backup busybox tar xvf /backup/backup.tar
data/
data/sven.txt
# Compare to the original container
$ sudo docker run --rm --volumes-from DATA -v `pwd`:/backup busybox ls /data
sven.txt

Here is a nice article from the excellent Brian Goff explaining why it is good to use the same image for a container and a data container.

edited Jul 25 '18 at 04:47

Peter Mortensen

28,342
21
95
123

answered Dec 18 '13 at 07:50

tommasop

17,645
2
38
51

Thanks Tom for sharing! I like the data-only container approach very much - especially with the "-name" flag when runing/creating it. – juwalter Dec 18 '13 at 11:20
Is this method preferred over -link? linking seems relatively new, but I think it remedies the sharing of data without the need for a separate data container. – bcorso May 21 '14 at 06:01
8

It's a differen tool for a different need. `--volumes-from` let you share disk space `--link` let you share services. – tommasop May 21 '14 at 08:47
1

@tommasop I was wondering if I need a data container or can I just use a directory on the machine running the docker container? So that when I run `docker -v /home/localUser/directoryToKeepPersistance:/opt/application/directoryOnconatiner container-name` everything that would be persisted in `/opt/application/directoryOnconatiner`will be persisted on my harddisk and reused when the process start again. – Thomas Jun 20 '14 at 13:46
1

You don't need a data container if it's easier for you to work with a host directory. Or you can use a data container linked to a local host directory if you want to be able to link the same data to different containers. – tommasop Jun 20 '14 at 14:01
3

There is another project in the works specifically meant for this kind of thing, maybe add it to this answer as a reference to watch? https://github.com/ClusterHQ/flocker – Andre Jul 14 '14 at 15:05
1

I think flocker goes waay beyond the scope of the answer. It is more an orchestration tool. – tommasop Jul 14 '14 at 15:27
A useful example of how to use data-containers in practice here: http://stackoverflow.com/a/27021154/430128 – Raman Nov 20 '14 at 15:34
Is that supposed to be `--rm` or `-rm`? – 425nesp May 08 '15 at 04:46
--rm thanks! I forgot to change it according to the new -- style – tommasop May 08 '15 at 08:25
9

Data containers don't have any meaning and are really bad idea! Container only means something when a process is running in that, otherwise it's just a piece of host file system. You can just mount a volume with -v that's the only and best option. You have control over the filesystem and physical disk you use. – Boynux May 08 '15 at 17:28
1

I agree with @Boynux. Data-only containers is asking for trouble. If you for some reason have to remove the data-only container (for instance if you need to move it to a different host) your only reference to the data is lost. The data is not physically removed from the hard-drive, but there's no way you can recreate the Data-only container again. Using host-volumes you avoid this trap. – Alex Jul 15 '15 at 22:37
@Bonyux and Alex Data containers do have a meaning which maybe is not useful for your personale use case http://docs.docker.com/userguide/dockervolumes/#data-volumes – tommasop Jul 16 '15 at 09:42
1

@Boynux data containers are most certainly good. You gain the most important thing: independence from the host system!! Now you can distribute and move your application as you want, without worrying that certain folders need to exist on the hosting system. – Spock Aug 01 '15 at 19:19
2

@Spock This is going to be a long discussion and maybe here is not a right place. But look, beside the fact that containers are just a Kernel namespace around a process, what problem containers are suppose to solve? I tell you, mutable infrastructure and how they do that? with immutable containers and what we do with so called "data containers"? creating mutable containers! This is pure contradiction. The aforementioned problem can be addressed with Docker storage drivers which are coming soon. Meanwhile, say no-no to Data Containers! – Boynux Aug 03 '15 at 13:46
Good points.. And yes the storage drivers will help alleviate some of the issues – Spock Aug 03 '15 at 13:51
1

I would also like to see the section about docker 1.9 expanded on. There is [an issue](https://github.com/docker/docker/issues/17798) on the docker github which specifies that as of 1.9.0 data only containers are now obsolete in favour of docker volume. The [documentation](http://docs.docker.com/v1.9/engine/userguide/dockervolumes/) for docker 1.9 omits any mention of this fact or its implications. – maaarghk Nov 16 '15 at 14:24
voilà ;) updated as requested, I also updated the API link to reflect the actual documentation – tommasop Nov 16 '15 at 19:04
On the Docker 1.9.x introduction of volume management; it appears to be removing volumes on container removal for me. http://stackoverflow.com/q/34957586/1254292 – gertvdijk Jan 22 '16 at 23:28
The question you are referencing has been removed but anyhow no, volumes created through the volume api are not removed on container removal – tommasop Jan 23 '16 at 06:54
Hey @tommasop I wanted to let you know that Flocker that was mentioned in the above comments by has deprecated its container API, the reason you referred to it as an orchestration tool. Now Flocker is only a volume manger for handling persistent storage in dbs and is designed to work w/ a orchestration tool like swarm to manage data volumes. Here are the docs about using Flocker via Docker Swarm, Kubernetes or Mesos https://docs.clusterhq.com/en/latest/index.html – ferrantim Feb 12 '16 at 11:18
@ferrantim thanks for the update I'll check it out and eventually update the answer accordingly. – tommasop Feb 12 '16 at 13:27
11

Yep, as of Docker 1.9, creating Named Volumes with the Volumes API (`docker volume create --name mydata`) are preferred over a Data Volume Container. Folks at Docker themselves suggest that Data Volume Containers “[are no longer considered a recommended pattern](https://github.com/docker/docker/issues/20465),” “[named volumes should be able to replace data-only volumes in most (if not all) cases](https://github.com/docker/docker/issues/17798),” and “[no reason I can see to use data-only containers](https://github.com/docker/docker/issues/17798).” – Quinn Comendant Feb 27 '16 at 20:50
So sad that this has so many votes and is the "accepted answer" on Google. Partially because it's so freaking difficult to get down-vote capability on SO. – coding Apr 24 '16 at 19:16
8

@coding, I'm sad you're sad, partially because you are judging answers with a 3 years' delay and partially because the answer is substantially right in all it's history. If you have any advice feel free to comment so that I can integrate the answer and help people not be sad – tommasop Apr 25 '16 at 01:34
1

@coding why would you want to down vote it ? I think tommasop did a great job in maintaining this up to date. – β.εηοιτ.βε May 09 '16 at 21:27
1

`docker volume rm $(docker volume ls -f dangling=true -q)` – mpugach Oct 18 '16 at 05:43

score 78 · Answer 2 · edited Jul 25 '18 at 04:50

78

In Docker release v1.0, binding a mount of a file or directory on the host machine can be done by the given command:

$ docker run -v /host:/container ...

The above volume could be used as a persistent storage on the host running Docker.

edited Jul 25 '18 at 04:50

Peter Mortensen

28,342
21
95
123

answered Oct 29 '14 at 10:30

amitmula

1,000
9
11

3

This should be the recommended answer as it is far less complex than the volume-container approach that has more votes at the moment – insitusec Nov 24 '16 at 16:15
2

I wish there was a flag to specify a host-uid : container-uid and host-gid : container-gid mapping when using this volume mount command. – rampion Mar 29 '17 at 15:48

score 36 · Answer 3 · edited Jul 25 '18 at 16:09

As of Docker Compose 1.6, there is now improved support for data volumes in Docker Compose. The following compose file will create a data image which will persist between restarts (or even removal) of parent containers:

Here is the blog announcement: Compose 1.6: New Compose file for defining networks and volumes

Here's an example compose file:

version: "2"

services:
  db:
    restart: on-failure:10
    image: postgres:9.4
    volumes:
      - "db-data:/var/lib/postgresql/data"
  web:
    restart: on-failure:10
    build: .
    command: gunicorn mypythonapp.wsgi:application -b :8000 --reload
    volumes:
      - .:/code
    ports:
      - "8000:8000"
    links:
      - db

volumes:
  db-data:

As far as I can understand: This will create a data volume container (db_data) which will persist between restarts.

If you run: docker volume ls you should see your volume listed:

local               mypthonapp_db-data
...

You can get some more details about the data volume:

docker volume inspect mypthonapp_db-data
[
  {
    "Name": "mypthonapp_db-data",
    "Driver": "local",
    "Mountpoint": "/mnt/sda1/var/lib/docker/volumes/mypthonapp_db-data/_data"
  }
]

Some testing:

# Start the containers
docker-compose up -d

# .. input some data into the database
docker-compose run --rm web python manage.py migrate
docker-compose run --rm web python manage.py createsuperuser
...

# Stop and remove the containers:
docker-compose stop
docker-compose rm -f

# Start it back up again
docker-compose up -d

# Verify the data is still there
...
(it is)

# Stop and remove with the -v (volumes) tag:

docker-compose stop
docker=compose rm -f -v

# Up again ..
docker-compose up -d

# Check the data is still there:
...
(it is).

Notes:

You can also specify various drivers in the volumes block. For example, You could specify the Flocker driver for db_data:
```
volumes:
  db-data:
    driver: flocker
```
As they improve the integration between Docker Swarm and Docker Compose (and possibly start integrating Flocker into the Docker eco-system (I heard a rumor that Docker has bought Flocker), I think this approach should become increasingly powerful.

Disclaimer: This approach is promising, and I'm using it successfully in a development environment. I would be apprehensive to use this in production just yet!

Flocker has been [shut down](https://clusterhq.com/2016/12/22/clusterf-ed/) and there isn't a lot of activity on the [github repo](https://github.com/ClusterHQ/flocker) — Krishna, Mar 29 '17 at 13:17

score 17 · Answer 4 · edited Jul 25 '18 at 16:03

In case it is not clear from update 5 of the selected answer, as of Docker 1.9, you can create volumes that can exist without being associated with a specific container, thus making the "data-only container" pattern obsolete.

See Data-only containers obsolete with docker 1.9.0? #17798.

I think the Docker maintainers realized the data-only container pattern was a bit of a design smell and decided to make volumes a separate entity that can exist without an associated container.

score 13 · Answer 5 · edited Jul 25 '18 at 04:48

13

While this is still a part of Docker that needs some work, you should put the volume in the Dockerfile with the VOLUME instruction so you don't need to copy the volumes from another container.

That will make your containers less inter-dependent and you don't have to worry about the deletion of one container affecting another.

edited Jul 25 '18 at 04:48

Peter Mortensen

28,342
21
95
123

answered Sep 12 '13 at 19:10

Tim Dorr

4,411
2
19
22

The flip-side argument is that the "data only" containers end up being the last-resort reference to the data volume (Docker destroys data volumes once the last container referencing that volume is removed with `docker rm`) – WineSoaked Oct 01 '14 at 05:33
2

This oficial guide from Docker suggests otherwise: http://docs.docker.com/userguide/dockervolumes/#backup-restore-or-migrate-data-volumes "Data volumes are designed to persist data, independent of the container’s life cycle. Docker therefore never automatically delete volumes when you remove a container, nor will it “garbage collect” volumes that are no longer referenced by a container." – Alex Jul 15 '15 at 21:57

score 13 · Answer 6 · edited Apr 19 '20 at 13:44

13

When using Docker Compose, simply attach a named volume, for example:

version: '2'
services:
  db:
    image: mysql:5.6
    volumes:
      - db_data:/var/lib/mysql:rw
    environment:
      MYSQL_ROOT_PASSWORD: root
volumes:
  db_data:

edited Apr 19 '20 at 13:44

Arsen Khachaturyan

6,472
4
32
36

answered Jan 31 '17 at 09:27

Czar Pino

5,858
6
29
58

score 9 · Answer 7 · edited Jul 25 '18 at 04:59

@tommasop's answer is good, and explains some of the mechanics of using data-only containers. But as someone who initially thought that data containers were silly when one could just bind mount a volume to the host (as suggested by several other answers), but now realizes that in fact data-only containers are pretty neat, I can suggest my own blog post on this topic: Why Docker Data Containers (Volumes!) are Good

See also: my answer to the question "What is the (best) way to manage permissions for Docker shared volumes?" for an example of how to use data containers to avoid problems like permissions and uid/gid mapping with the host.

To address one of the OP's original concerns: that the data container must not be deleted. Even if the data container is deleted, the data itself will not be lost as long as any container has a reference to that volume i.e. any container that mounted the volume via --volumes-from. So unless all the related containers are stopped and deleted (one could consider this the equivalent of an accidental rm -fr /) the data is safe. You can always recreate the data container by doing --volumes-from any container that has a reference to that volume.

As always, make backups though!

UPDATE: Docker now has volumes that can be managed independently of containers, which further makes this easier to manage.

score 9 · Answer 8 · edited Jul 25 '18 at 16:19

There are several levels of managing persistent data, depending on your needs:

Store it on your host
- Use the flag -v host-path:container-path to persist container directory data to a host directory.
- Backups/restores happen by running a backup/restore container (such as tutumcloud/dockup) mounted to the same directory.
Create a data container and mount its volumes to your application container
- Create a container that exports a data volume, use --volumes-from to mount that data into your application container.
- Backup/restore the same as the above solution.
Use a Docker volume plugin that backs an external/third-party service
- Docker volume plugins allow your datasource to come from anywhere - NFS, AWS (S3, EFS, and EBS)
- Depending on the plugin/service, you can attach single or multiple containers to a single volume.
- Depending on the service, backups/restores may be automated for you.
- While this can be cumbersome to do manually, some orchestration solutions - such as Rancher - have it baked in and simple to use.
- Convoy is the easiest solution for doing this manually.

score 8 · Answer 9 · edited Jun 20 '20 at 09:12

8

If you want to move your volumes around you should also look at Flocker.

From the README:

Flocker is a data volume manager and multi-host Docker cluster management tool. With it you can control your data using the same tools you use for your stateless applications by harnessing the power of ZFS on Linux.

This means that you can run your databases, queues and key-value stores in Docker and move them around as easily as the rest of your application.

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 02 '15 at 11:58

Johann Romefort

91
2
1

1

Thanks Johann. I work at ClusterHQ and I just wanted to note that we've moved beyond only ZFS-based storage. You can now use Flocker with storage like Amazon EBS or Google Persistent Disk. Here is a complete list of storage options: https://docs.clusterhq.com/en/latest/supported/index.html#iaas-block-storage – ferrantim Feb 12 '16 at 11:12
1

Flocker is ceased and should not be used https://portworx.com/helping-clusterhq-flocker-customers-move-forward-2017/ – jesugmz Sep 09 '18 at 00:41

score 6 · Answer 10 · edited Jul 25 '18 at 04:49

6

It depends on your scenario (this isn't really suitable for a production environment), but here is one way:

Creating a MySQL Docker Container

This gist of it is to use a directory on your host for data persistence.

edited Jul 25 '18 at 04:49

Peter Mortensen

28,342
21
95
123

answered Dec 14 '13 at 23:12

ben schwartz

2,379
1
17
20

6

Thanks Ben, however - one of the issues I can see with this approach: the file system resource (directory, files) would be owned by a uid from within the docker/lxc container (guest) - one that might possibly collide with a uid on the host ... – juwalter Dec 15 '13 at 14:31
1

i think you're pretty safe as it is run by root, but I agree it is a hack - suitable for local dev / ephemeral integration testing at best. This is definitely an area I'd like to see more patterns / thinking emerge. You should check out / post this question to the docker-dev google group – ben schwartz Dec 18 '13 at 04:25
Ben, thanks for this solution! I wouldn't call it a hack though, it seems much more reliable than *container as volume*. Do you see any drawbacks in case when data is used solely from the container? (UID doesn't matter in this case) – johndodo Aug 18 '14 at 14:07

score 3 · Answer 11 · answered Apr 02 '15 at 12:29

I recently wrote about a potential solution and an application demonstrating the technique. I find it to be pretty efficient during development and in production. Hope it helps or sparks some ideas.

Repo: https://github.com/LevInteractive/docker-nodejs-example
Article: http://lev-interactive.com/2015/03/30/docker-load-balanced-mongodb-persistence/

score 2 · Answer 12 · edited Jul 25 '18 at 16:04

2

I'm just using a predefined directory on the host to persist data for PostgreSQL. Also, this way it is possible to easily migrate existing PostgreSQL installations to Docker containers: https://crondev.com/persistent-postgresql-inside-docker/

edited Jul 25 '18 at 16:04

Peter Mortensen

28,342
21
95
123

answered Mar 20 '16 at 08:04

Alen Komljen

371
2
5

score 0 · Answer 13 · edited Jul 25 '18 at 15:56

My solution is to get use of the new docker cp, which is now able to copy data out from containers, not matter if it's running or not and share a host volume to the exact same location where the database application is creating its database files inside the container. This double solution works without a data-only container, straight from the original database container.

So my systemd init script is taking the job of backuping the database into an archive on the host. I placed a timestamp in the filename to never rewrite a file.

It's doing it on the ExecStartPre:

ExecStartPre=-/usr/bin/docker cp lanti-debian-mariadb:/var/lib/mysql /home/core/sql
ExecStartPre=-/bin/bash -c '/usr/bin/tar -zcvf /home/core/sql/sqlbackup_$$(date +%%Y-%%m-%%d_%%H-%%M-%%S)_ExecStartPre.tar.gz /home/core/sql/mysql --remove-files'

And it is doing the same thing on ExecStopPost too:

ExecStopPost=-/usr/bin/docker cp lanti-debian-mariadb:/var/lib/mysql /home/core/sql
ExecStopPost=-/bin/bash -c 'tar -zcvf /home/core/sql/sqlbackup_$$(date +%%Y-%%m-%%d_%%H-%%M-%%S)_ExecStopPost.tar.gz /home/core/sql/mysql --remove-files'

Plus I exposed a folder from the host as a volume to the exact same location where the database is stored:

mariadb:
  build: ./mariadb
  volumes:
    - $HOME/server/mysql/:/var/lib/mysql/:rw

It works great on my VM (I building a LEMP stack for myself): https://github.com/DJviolin/LEMP

But I just don't know if is it a "bulletproof" solution when your life depends on it actually (for example, webshop with transactions in any possible miliseconds)?

At 20 min 20 secs from this official Docker keynote video, the presenter does the same thing with the database:

Getting Started with Docker

"For the database we have a volume, so we can make sure that, as the database goes up and down, we don't loose data, when the database container stopped."

What do you mean by *"... get use of ..."*? And *"... transactions in any possible miliseconds"*? — Peter Mortensen, Jul 25 '18 at 15:57

score 0 · Answer 14 · edited Jul 25 '18 at 16:12

Use Persistent Volume Claim (PVC) from Kubernetes, which is a Docker container management and scheduling tool:

Persistent Volumes

The advantages of using Kubernetes for this purpose are that:

You can use any storage like NFS or other storage and even when the node is down, the storage need not be.
Moreover the data in such volumes can be configured to be retained even after the container itself is destroyed - so that it can be reclaimed, if necessary, by another container.

score 0 · Answer 15 · answered Nov 27 '20 at 19:19

To preserve or storing database data make sure your docker-compose.yml will look like if you want to use Dockerfile

version: '3.1'

services:
  php:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 80:80
    volumes:
      - ./src:/var/www/html/
  db:
    image: mysql
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: example
    volumes:
      - mysql-data:/var/lib/mysql

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080
volumes:
  mysql-data:

your docker-compose.yml will looks like if you want to use your image instead of Dockerfile

version: '3.1'   

services:
  php:
    image: php:7.4-apache
    ports:
      - 80:80
    volumes:
      - ./src:/var/www/html/
  db:
    image: mysql
    command: --default-authentication-plugin=mysql_native_password
    restart: always
    environment:
      MYSQL_ROOT_PASSWORD: example
    volumes:
      - mysql-data:/var/lib/mysql

  adminer:
    image: adminer
    restart: always
    ports:
      - 8080:8080
volumes:

if you want to store or preserve data of mysql then must remember to add two lines in your docker-compose.yml

volumes:
  - mysql-data:/var/lib/mysql

and

volumes:
  mysql-data:

after that use this command

docker-compose up -d

now your data will persistent and will not be deleted even after using this command

docker-compose down

extra:- but if you want to delete all data then you will use

docker-compose down -v

plus you can check your database data list by using this command

docker volume ls

DRIVER              VOLUME NAME
local               35c819179d883cf8a4355ae2ce391844fcaa534cb71dc9a3fd5c6a4ed862b0d4
local               133db2cc48919575fc35457d104cb126b1e7eb3792b8e69249c1cfd20826aac4
local               483d7b8fe09d9e96b483295c6e7e4a9d58443b2321e0862818159ba8cf0e1d39
local               725aa19ad0e864688788576c5f46e1f62dfc8cdf154f243d68fa186da04bc5ec
local               de265ce8fc271fc0ae49850650f9d3bf0492b6f58162698c26fce35694e6231c
local               phphelloworld_mysql-data

How to deal with persistent storage (e.g. databases) in Docker

15 Answers15

Docker 1.9.0 and above

Docker 1.8.x and below

Linked