Questions tagged [high-availability]

High availability is a software design approach and implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

Attributes of high availability (HA):

  • Maximum uptime
  • Online maintenance - With little or no service interruption.
  • Simplicity - Complexity is an enemy of reliability, and encourages operator error, and so it is best avoided (e.g., Does a particular use-case really require the burden of implementing HA?).

Approaches that increase availability:

  • Fault-tolerance: Duplicate services waiting to take over should the primary fail or become unreachable.
    • Active/Active +Enables load-balancing -More complicated
    • Active/Passive +Simpler -Does not increase load capacity.
  • Replication:
    • Synchronous +Safer -Slow over longer distances.
    • Asynchronous +Faster -Possibility of data loss
      • The "A" in CAP Theorem.
  • Graceful degradation: Rate limiting and client throttling.
1355 questions
73
votes
11 answers

ZooKeeper alternatives? (cluster coordination service)

ZooKeeper is a highly available coordination service for data centers. It originated in the Hadoop project. One can implement locking, fail over, leader election, group membership and other coordination issues on top of it. Are there any…
52
votes
5 answers

Web App: High Availability / How to prevent a single point of failure?

Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure. However, even if a load balancer is used- isn't that the single point of failure?
nickb
  • 8,430
  • 11
  • 34
  • 46
39
votes
4 answers

Redis master/slave replication - single point of failure?

How does one upgrade to a newer version of Redis with zero downtime? Redis slaves are read-only, so it seems like you'd have to take down the master and your site would be read-only for 45 seconds or more while you waited for it to reload the DB. Is…
nornagon
  • 14,011
  • 16
  • 68
  • 84
34
votes
5 answers

How to Guarantee Message delivery with Celery?

I have a python application where I want to start doing more work in the background so that it will scale better as it gets busier. In the past I have used Celery for doing normal background tasks, and this has worked well. The only difference…
Ken Cochrane
  • 68,551
  • 9
  • 45
  • 57
28
votes
4 answers

Method to replicate sqlite database across multiple servers

I'm developing an application that works distributed, and I have a SQLite database that must be shared between distributed servers. If I'm in serverA, and change sqlite row, this change must be in the other servers instantly, but if a server were…
26
votes
4 answers

Scala + Akka: How to develop a Multi-Machine Highly Available Cluster

We're developing a server system in Scala + Akka for a game that will serve clients in Android, iPhone, and Second Life. There are parts of this server that need to be highly available, running on multiple machines. If one of those servers dies…
Unoti
  • 1,255
  • 10
  • 12
25
votes
5 answers

Design Patterns (or techniques) for Scalability

What design patterns or techniques have you used that are specifically geared toward scalability? Patterns such as the Flyweight pattern seem to me to be a specialized version of the Factory Pattern, to promote high scalability or when working…
Chris Ballance
  • 32,056
  • 25
  • 101
  • 147
20
votes
7 answers

name node Vs secondary name node

Hadoop is Consistent and partition tolerant, i.e. It falls under the CP category of the CAP theoram. Hadoop is not available because all the nodes are dependent on the name node. If the name node falls the cluster goes down. But considering the fact…
Sam
  • 2,207
  • 7
  • 31
  • 53
19
votes
2 answers

How to setup Jenkins with HA?

Currently we are using a Jenkins as our CI system and there is one master server and slaves which are provisioned by Saltstack on Openstack. If our Jenkins master server goes down, we need to create a new master and we need to pull the files from…
19
votes
3 answers

Which part of the CAP theorem does Cassandra sacrifice and why?

There is a great talk here about simulating partition issues in Cassandra with Kingsby's Jesper library. My question is - with Cassandra are you mainly concerned with the Partitioning part of the CAP theorem, or is Consistency a factor you need to…
hawkeye
  • 31,052
  • 27
  • 133
  • 271
18
votes
1 answer

Why should a production Kubernetes cluster have a minimum of three nodes?

The first section of the official Kubernetes tutorial states that, A Kubernetes cluster that handles production traffic should have a minimum of three nodes. but gives no rationale for why three is preferred. Is three desirable over two in order…
rjs
  • 769
  • 1
  • 9
  • 19
17
votes
13 answers

How do you update a live, busy web site in the politest way possible?

When you roll out changes to a live web site, how do you go about checking that the live system is working correctly? Which tools do you use? Who does it? Do you block access to the site for the testing period? What amount of downtime is…
Tim Booker
  • 2,711
  • 1
  • 23
  • 34
17
votes
3 answers

Why are RDBMS considered Available (CA) for CAP Theorem

If I understand the CAP Theorem correctly, availability means that the cluster continues to operate even if a node goes down. I've seen a lot of people (http://blog.nahurst.com/tag/guide) list RDBMS as CA, but I do not understand how RBDMS is…
PiedPiper
  • 203
  • 3
  • 7
16
votes
7 answers

How to design and verify distributed systems?

I've been working on a project, which is a combination of an application server and an object database, and is currently running on a single machine only. Some time ago I read a paper which describes a distributed relational database, and got some…
Esko Luontola
  • 71,072
  • 15
  • 108
  • 126
16
votes
4 answers

Do load balancers flood?

I am reading about load balancing. I understand the idea that load balancers transfer the load among several slave servers of any given app. However very few literature that I can find talks about what happens when the load balancers themselves…
PedroD
  • 4,310
  • 8
  • 38
  • 75
1
2 3
90 91