Web App: High Availability / How to prevent a single point of failure?

Question

Can someone explain to me how high-availability ("HA") works for a web application ... because I assume HA means that there exist no single-point-of-failure.

However, even if a load balancer is used- isn't that the single point of failure?

@Dave Newton, but how do 2 load balancers answer the single request coming in? I'm trying to imagine, so let's I want to visit http://example.com, my browser resolves the IP address and then sends a single request to the IP of example.com, how is it possible that multiple servers (load balancers) can "answer" the web request coming in from my browser? At some point, it's there a single piece of hardware that is the point of failure? — nickb, Oct 30 '11 at 19:26
They don't; one does. If one starts to fail, the other takes over. There are a variety of mechanisms to handle this, all beyond the scope of an SO question, really. Desmond already pretty much said all that. — Dave Newton, Oct 30 '11 at 19:30
Argh. I feel your frustration, nickb. It's very clear that just changing your IP address to point at a load-balancer (or a load-balancer-balancer, or a load-balancer-balancer-balancer) doesn't achieve high-availability, because then *that* load balancer can fail. Yet answers to this question all over the net seem to consist of either *"Just add another layer of load balancing!"* (which plainly doesn't help) or *"This is a very complicated topic that you are too noob to understand"*. @DaveNewton has managed to provide *both* unhelpful dismissals, here. — Mark Amery, Apr 05 '18 at 16:09
@MarkAmery Fault-tolerance is *well* beyond the scope of an SO answer, even if it *was* on-topic. Nonetheless, despite your cries of "oh that doesn’t help" that’s the answer: scaling out balancers/servers/infra is the solution. — Dave Newton, Apr 05 '18 at 16:54
@DaveNewton No, it's *really obviously not* the solution. Making your IP resolve to a single entry-point load balancer is just as much of a single point of failure as having it resolve to a single web server, whether that load balancer has one or 100 more layers of load balancers behind it. What exactly is hard to understand here? The real solution clearly involves something other than just scaling out layers of load balancers. (I think it involves doing clever things with BGP, though that's way outside my area of expertise.) — Mark Amery, Apr 06 '18 at 12:58
@MarkAmery Which is why I said multiple balancers? I’m not *sure* what’s hard to understand here: to eliminate single points of failure you implement failovers. They can fail too—the point is to have redundancy and hope failures can be resolved. How do you think large websites work? Multiple points of entry, app servers, DBs. Switchable fabric to re-route requests, internal or external, when failures are detected. I don’t know of any mid- to large-scale site that has single *anything*. Shrug—it’s been working for every site I’ve been involved with, from 10sK to 10sM. — Dave Newton, Apr 06 '18 at 13:09
@DaveNewton *"Which is why I said multiple balancers?"* - co-ordinated *how*, if not by another load balancer in front of them? The entire question here is what *mechanism* there is by which it's possible to let one server (or load balancer) take over when another fails besides just sticking another SPOF in front of them. I have no idea what that mechanism is, which is why I ended up here; throwing more layers at the problem clearly doesn't solve it. Maybe it's the "switchable fabric" you allude to, although I don't know what "fabric" or "sK" or "sM" are and none of them yield to Googling. — Mark Amery, Apr 06 '18 at 13:25
@MarkAmery Those are numbers of users. I think we're talking past each other-but there are many resources you could scan to understand the basics of HA infrastructure. — Dave Newton, Apr 06 '18 at 14:58
@MarkAmery Agree with you, which is why I'm reading all through the end of the chat — Sankarganesh Eswaran, Aug 17 '18 at 13:23
Clearly it all comes down to ensuring the DNS-resolved first load balancer is HA. There must be a system to monitor its availability (like sentinel in redis), which -- e.g. by a quorum decision -- can decide the load balancer went down, and issue commands to a hot-standby replacement to take over (e.g. assume the IP DNS is resolving to). — P Marecki, Mar 26 '20 at 16:24

score 19 · Accepted Answer · answered Mar 07 '13 at 05:31

19

I have found this article on the subject: http://www.tenereillo.com/GSLBPageOfShame.htm

Basically if you do not require long lasting sticky sessions you can configure your DNS servers to return multiple A records (IP addresses) for your website.

Web browsers are smart enough to try all the addresses until they find one that works.

answered Mar 07 '13 at 05:31

user677686

270
2
5

4

-1; this contradicts multiple sources I've seen (example: https://serverfault.com/a/328321/147556) that claim that returning multiple A records (AKA "round robin DNS") does *not* result in browsers (which are the main kind of HTTP clients we care about when talking about websites) rapidly cycling through the IPs to find one that works in the event of a failure, but instead incurs long timeouts, and that as such having multiple IPs in an A record is not a solution to "high-availability". Maybe everyone else is wrong, or maybe things have changed since 2010, but I cautiously assume not. – Mark Amery Apr 05 '18 at 16:31
3

We can't even trust browsers to consistently run the same line of JavaScript. Not sure I'd be comfortable relying on them to round-robin a list of IPs. – Damien Roche Apr 23 '18 at 22:17

Techie · Answer 2 · 2015-08-18T08:34:00.200

10

In simple words high availability can be defined as running a system 24*7 without a downtime even if there are hardware and software failures. In other way a fault tolerance application. This helps ensure uninterrupted use of the application for it’s intended users.

Read more on High Availability Deployment Architecture

edited Aug 18 '15 at 08:34

answered Aug 17 '15 at 13:13

Techie

42,101
38
144
232

deepak goyal · Answer 3 · 2021-02-03T19:07:20.830

Sure it is when operated alone. Usual highly available setup includes 2 or more load balancers running in cluster in either active/active or active/passive configuration. To further increase the availability you can have 2 different Internet Service Providers (or geo distributed datacenters) each running a pair of clustered load balancers. Then you configure DNS A record resolving to 2 distinct public IP addresses which guarantees round-robin processing splitting DNS requests evenly (CloudFlare is very fast and reliable at this). There's also possibility to return IP address of datacenter closest to your originating geo location by using something like PowerDNS dnsdist This is what big players do to make their services highly available.

Please read https://docs.oracle.com/cd/E23824_01/html/821-1453/gkkky.html for more clearity. Actually both load balancer uses same vip(Virtual IP Address. https://techterms.com/definition/vip).

score 1 · Answer 4 · answered Feb 25 '12 at 23:26

1

It works the following way that you setup two HA Proxy servers with heartbeat, so when one fails (stops responding to queries), it's being removed from the cluster. Requests from HA Proxy can be forwarded to web servers in round robin fashion, and if one web server fails, HA Proxy servers do not try to contact it until it's alive. Web servers are storing all dynamic information in database, which is replicated across two MySQL instances. As you can see, HA Proxy and Cluster MySQL (or simply MySQL replication) as well IP Clustering here is the key.

example high availabibility cluster

answered Feb 25 '12 at 23:26

Andrew

995
7
14

8

But in your diagram, what I don't understand is, how does HAPRoxy work? When the Client DNS resolves, it can only resolve to a single machine. So are HAProxy somehow sharing the same IP address? – nickb May 02 '13 at 20:35
@nickb as Dave Newton responded above, the DNS can be configured to return multiple IP addresses for one external hostname. The client can then make multiple attempts to contact the service. See 'A RECORDS' and 'CNAME RECORDS' with respect to DNS configuration. – simon.watts Nov 24 '14 at 14:23
@nickb You are right, the HA service can enable the HA Proxies to share a single virtual IP that the Client will connect to. The HA service for unix can be (u)carp and keepalived, RedHat Cluster Suite or Pacemaker, etc. See also: http://serverfault.com/questions/686878/how-to-make-redundant-load-balancers – Yuci Mar 11 '17 at 18:13

score 0 · Answer 5 · answered Oct 30 '11 at 04:03

0

HA architecture is a entire field and multiple books were written on it, so it is hard to answer in a short paragraph.

To sum up the ideal situation, you would be using multiple servers, interconnected to a layer of multiple load balancers. The nodes and LB will be located in a few different data centers, and connected to different network backbone. Ideally the data centers will be located all over the world.

In short, all component will have redundancy, including the load balancers.

For a starting point, see Wikipedia for High Availability Cluster

answered Oct 30 '11 at 04:03

Desmond Zhou

1,341
11
18

3

But at some point, the single request from the users web browser will have to be split to multiple load balancers. At this point, wouldn't it be a single point of failure? Meaning, how is it possible for a single request to come into multiple load balancers? – nickb Oct 30 '11 at 04:13
3

Yes, the user's request will end up in ONE　of the load balancer that is online, and it is possible the LB goes down at precisely the moment it is processing request and losing it. The important thing HA address is that if the user immediately retry he will end up in another LB that is online and be successful, so will the other users of the system. HA is concerned about the whole system being available (all failures transient), rather than any particular request being successful. – Desmond Zhou Oct 30 '11 at 04:22
4

How do you do that? DNS round robin? – nickb Oct 31 '11 at 04:36

Web App: High Availability / How to prevent a single point of failure?

5 Answers5