Questions tagged [high-availability]

High availability is a software design approach and implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.

Attributes of high availability (HA):

  • Maximum uptime
  • Online maintenance - With little or no service interruption.
  • Simplicity - Complexity is an enemy of reliability, and encourages operator error, and so it is best avoided (e.g., Does a particular use-case really require the burden of implementing HA?).

Approaches that increase availability:

  • Fault-tolerance: Duplicate services waiting to take over should the primary fail or become unreachable.
    • Active/Active +Enables load-balancing -More complicated
    • Active/Passive +Simpler -Does not increase load capacity.
  • Replication:
    • Synchronous +Safer -Slow over longer distances.
    • Asynchronous +Faster -Possibility of data loss
      • The "A" in CAP Theorem.
  • Graceful degradation: Rate limiting and client throttling.
1355 questions
15
votes
8 answers

What design patterns are most leveraged in creating high availability applications?

Likewise are there design patterns that should be avoided?
McGovernTheory
  • 6,224
  • 4
  • 38
  • 71
15
votes
2 answers

ElasticSearch 1.6 seems to lose documents during high availability test

As part of an investigation for using ElasticSearch as a reliable document store, from a Java application, I'm running a basic HA test as follows: I set up a minimal cluster using a readily available Docker image of ElasticSearch 1.6…
apanday
  • 481
  • 4
  • 15
15
votes
1 answer

redis: Handling failover?

Redis really seems like a great product with the built in replication and the amazing speed. After testing it out, it feels definitely like the 2010 replacement of memcached. However, since when normally using memcached, a consistent hashing is…
Industrial
  • 36,181
  • 63
  • 182
  • 286
14
votes
5 answers

How to configure RabbitMQ using Active/Passive High Availability architecture

I'm trying to setup a cluster of RabbitMQ servers, to get highly available queues using an active/passive server architecture. I'm following this guides:…
rhodan
  • 317
  • 1
  • 5
  • 8
14
votes
3 answers

Failover & Disaster Recovery

What's the difference between failover and disaster recovery?
coder
  • 2,101
  • 5
  • 21
  • 20
13
votes
5 answers

Zero downtime deployment for Java apps

I am trying to build the very lightweight solution for zero downtime deployment for Java apps. For the sake of simplicity lets think that we have two servers. My solution is to use: On the "front" -- some load balancer (software) - I am thinking…
alexeypro
  • 3,393
  • 7
  • 34
  • 48
13
votes
3 answers

Looking for a scalable "at" implementation

I'm looking for a scalable "at" replacement, with high availability. It must support adding and removing jobs at runtime. Some background: I have an application where I trigger millions of events, each event occurs just once. I don't need cron like…
David Rabinowitz
  • 28,033
  • 14
  • 88
  • 124
12
votes
1 answer

What's best practice for HA gearman job servers

From gearman's main page, they mention running with multiple job servers so if a job server dies, the clients can pick up a new job server. Given the statement and diagram below, it seems that the job servers do not communicate with each other. Our…
Paul DelRe
  • 3,845
  • 1
  • 21
  • 25
12
votes
1 answer

Multi-homed SQL Server with High Availability Groups

We have two servers (SQL-ATL01, SQL-ATL02) that make up a Failover Cluster, each running as part of a SQL Server High Availability Group (HAG). Each server has two network cards. One is a 10Gbit card that is directly connected to the other server…
Josef
  • 6,979
  • 3
  • 29
  • 33
12
votes
1 answer

Highly available Service Fabric WebApi hosted on Azure

We are exposing a stateless Owin WebAPI hosted on all nodes in our service fabric cluster (instance count -1) on Azure. The WebAPI is meant for public consumption and should be highly available even in the face of upgrades to the internal services…
12
votes
1 answer

RabbitMQ clustering and mirror queues behavior behind the scenes

Can someone please explain what is going on behind the scenes in a RabbitMQ cluster with multiple nodes and queues in mirrored fashion when publishing to a slave node? From what I read, it seems that all actions other than publishes go only to the…
Cosmin Vasii
  • 969
  • 2
  • 10
  • 17
12
votes
1 answer

S3 high-availability + reliability for backups

I did some research on this, but wasn't able to find any substantial answers, so turning to StackOverflow. How reliable is Amazon's S3 in terms of high-availability and reliability? I realize there are SLAs for it, but what about if a availability…
Suman
  • 8,407
  • 5
  • 43
  • 61
11
votes
2 answers

mobile detection high traffic site

i have a high traffic website (1+ Millions visitors per day) and i need to detect their user agent. i have a list over 1000 mobile devices. i run memcache to output dynamic content based on what page they access and params they put…
aki
  • 1,211
  • 2
  • 13
  • 42
11
votes
2 answers

Airflow setup for high availability

How to deploy apache airflow (formally known as airbnb's airflow) scheduler in high availability? I am not asking about the backend DB or RabbitMQ that should obviously be deployed in high availability configuration. My main focus is the scheduler -…
Ofer Eliassaf
  • 2,651
  • 1
  • 14
  • 20
11
votes
1 answer

How to learn about designing highly transactional systems?

I have been mostly working on data analysis, BI tools, etc. in my career. Most of the applications I work on are majorly read-only applications. Although I have also worked on simple CRUD applications but nothing extraordinarily transactional. As a…
1
2
3
90 91