Using etcd as primary store/database?

Question

Can etcd be used as reliable database replacement? Since it is distributed and stores key/value pairs in a persistent way, it would be a great alternative nosql database. In addition, it has a great API. Can someone explain why this is not a thing?

I am trying to see if I can use etcd (k8s CRDs) as database replacement, can you share your experience with etcd. See https://stackoverflow.com/questions/52565131/can-i-store-my-application-data-in-kubernetes-configuration-resources — Jay Rajput, Sep 29 '18 at 04:38
I found etcd especially useful to store config files / static files which need to be available all the time (like Kubernetes does and the name implies a distributed `/etc` folder => etc + d(istributed) = etcd). By running a multi-node etcd cluster, one can be sure files are available. I would say it highly depends on your use case and the data you want to store. Benchmarks show about 30k queries per second max on etcd. — Plus Ultra, Oct 07 '18 at 22:37
I used etcd for all sorts of config data stuff, and did so for a long time. It's not a generic database, but rather, a key-value database. For data stores which need high-speed distributed access using a model which is based on retrieving values by key or range of keys, possibly with namespacing and granular access control, it's a great option. For models where there is frequent searching of records for a value containing a string, for example, it's not so great. Choose a data store based on how the data will be used. :) — dannysauer, Feb 19 '20 at 16:53

score 37 · Answer 1 · answered Dec 14 '16 at 07:25

etcd

etcd is a highly available key-value store which Kubernetes uses for persistent storage of all of its objects like deployment, pod, service information.
etcd has high access control, that it can be accessed only using API in master node. Nodes in the cluster other than master do not have access to etcd store.

nosql database

There are currently more than than 255 nosql databases, which can be broadly classified into Key-Value based, Column based, Document based and Graph based. Considering etcd as an key-value store, lets see the available nosql key-value data stores.
Redis, memcached and memcacheDB are popular key-value stores. These are general-purpose distributed memory caching system often used to speed up dynamic database-driven websites by caching data and objects in memory.

Why etcd not an alternative

etcd cannot be stored in memory(ram) they can only be persisted in disk storage, whereas redis can be cached in ram and can also be persisted in disk.
etcd does not have various data types. It is made to store only kubernetes objects. But redis and other key-value stores have data-type flexibility.
etcd guarantees only high availabilty, but does not give you the fast querying and indexing. All the nosql key-value stores are built with the goal of fast querying and searching.

Eventhough it is obvious that etcd cannot be used as an alternative nosql database, I think the above explanation will prove it cannot be an suitable alternative.

"It is made to store only kubernetes objects" --> this is not true. Although Kubernetes is the one of the main customer of etcd, but that doesn't mean only kubernetes objects can be stored in etcd. etcd is more aiming to store data in distributed environment. — HongKun Yoo, Mar 16 '19 at 10:25
why do you state that "etcd has high access control, that it can be accessed only using API in master node. Nodes in the cluster other than master do not have access to etcd store". Deploying your own etcd is as esasy as deploying your own database and access can be provided to whichever entity you want ? — Antonin, Jun 06 '19 at 02:48
The cons here are all wrong, probably because the author has only worked with etcd in the context of Kubernetes. etcd works from memory, and only stores the journal on disk. etcd stores data (both key and value) as a binary array; the end user can apply whatever typing they want (often by storing values as JSON). And etcd uses a btree to index the keys, which is the same indexing that most any other DB uses on generic data. It doesn't use SQL, I suppose, but "queries and searches" appropriate for data in a key-value DB are extremely fast in etcd. — dannysauer, Feb 19 '20 at 16:49
This answer should not be concerned by anybody. The second part is completely wrong. — Karim Manaouil, Oct 22 '20 at 11:54

score 2 · Answer 2 · answered Oct 30 '17 at 20:53

The only answer I've come to see are those between our ears. Guess we need to show first that it can be done, and what the benefits are.

My colleagues seem to shy off it because "it's for storing secrets, and common truth". The etcd v3 revise made etcd capable of much more, but the news hasn't simply rippled down, yet.

Let's make some show cases, success stories. Personally, I like etcd because of the reasons you mentioned, and because of its focus on dependable performance.

score 1 · Answer 3 · answered Jan 12 '20 at 14:29

First, no. Etcd is not the next nosql replacement. But there are some sort of scenarios, where it can come in handy.

Let's imagine you have (configuration) data, that is mostly static but may change on runtime. Maybe your frontend needs to know the backend endpoints based on the customers country to comply with legal and you know the world wide rollout is done in phases.

So you could just use a k8s configMap to store the array of data (country -> endpoint) and let your backend watch this configMap for changes. On change, the application just reads in the list and provides a repository to allow access to the data from your service layer. All operations need to be implemented in the repository (search, get, update, ...) but your data will be in memory (probably a linked hash map). So it will be very quick to retrieve (like a local cache).

If data get changed by the application just serialize the list and patch the configMap. Any other application watching the configMap will update their internal state. However there is no locking. So quick changes may result in race conditions.

etcd allows for 1Mb to be stored. That's enough for almost static data.

Another application might be feature toggles. They do not changed that much but when they do, every application needs to know quickly and polling sucks.

score 0 · Answer 4 · edited Mar 14 '21 at 02:38

From the ETCD.IO site:

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node.

It has a simple interface using http and json. It is NOT just for Kubernetes. Kubernetes is just an example of a critical application that uses it.

You are right it should be a thing. A nice reliable data store with an easy to use API and a nice way of telling you when things change using raft protocol. This is great for feature toggles and other items where everything needs to know and is much better than things like putting a trigger in an sql database and getting it to send an event to an external application or really horrible polling.

So if you are writing something like the kubernetes use case >> it is perfect a well proven store for a distributed application.

If you are writing something very different to the kubernetes use case, then you are comparing with all the other no-sql databases. But is very different to something like mongodb so it may be better for you if mongodb or similar does not work for you.

Other example users

M3, a large-scale metrics platform for Prometheus created by Uber, uses etcd for rule storage and other functions

Consistency There is a nice comparison of NOSQL database consistency by Jepson at https://jepsen.io/analyses

ETCD sum up their result at https://etcd.io/blog/jepsen-343-results/

Using etcd as primary store/database?

4 Answers4

Linked