12

How is Erlang fault tolerant, or help in that regard?

2240
  • 1,368
  • 1
  • 6
  • 18
Blankman
  • 236,778
  • 296
  • 715
  • 1,125
  • 3
    I wish people would explain downvotes, for someone not familiar with the subject it might provide at least some interesting information. – Alex Paven Sep 21 '10 at 14:42
  • 6
    well the main page says it is fault tolerant, was curious how it provides this? – Blankman Sep 21 '10 at 14:56

4 Answers4

5

I think I covered part of the answer in this reply to another thread.

Community
  • 1
  • 1
I GIVE TERRIBLE ADVICE
  • 8,978
  • 2
  • 29
  • 40
4

Erlang is fault tolerant with the following things in mind:

  • Erlang knows that errors WILL happen, and things will break, so instead of guarding against errors, Erlang lets you have strong tools to minimize impact of errors and recover from them as they happen.

  • Erlang encourages you to program for success case, and crash if anything goes wrong without trying to recover partially broken data. The idea behind this is that partially incorrect data may propagate further in your system and may get written to database, and thus presents risk to your system. Better to get rid of it early and only keep fully correct data.

  • Process isolation in Erlang helps with minimizing impact of partially wrong data when it appears and then leads to process crash. System cleans up the crashed code and its memory but keeps working as a whole.

  • Supervision and restart strategies help keep your system fully functional if parts of it crashed by restarting vital parts of your system and bringing them back into service. If something goes very wrong such that restarts happen too much, the system is considered broken beyond repair and thus is shut down.

3

Caveat: I am an Erlang noob.

@Daniel's answer is essentially correct. I strongly suggest that you take the time to read Erlang creator Joe Armstrong's thesis (Making reliable distributed systems in the presence of software errors). The thesis provides a good explanation of the need for, and the solution to, developing robust distributed systems. I believe the paper will answer your question satisfactorily.

Manoj Govindan
  • 64,355
  • 21
  • 123
  • 132
  • Hehe, I'm also an Erlang noob, but I said what I said because it's the way in which an idiomatic Erlang system is fundamentally different from a system in virtually every other environment. Or rather, other systems seem to provide no advice about how to handle these issues, whereas Erlang does. – Daniel Yankowsky Sep 22 '10 at 01:27
2

Erlang makes it easy to create many, small processes, and to monitor those processes. When one of those processes crashes, it may be possible to restart that part of the system without needing to bring the whole thing down.

You may have seen something like this in modern versions of Windows: the system can restart the graphics driver if it crashes; it doesn't kill the whole system.

To make it easier to write fault-tolerant applications, Erlang provides the concept of supervisor processes. These processes monitor a number of child processes, and know how to respond if a child dies. You might create a whole supervision tree, so that you have fine control about how different parts of the application behave. You can read more in the Erlang documentation.

Daniel Yankowsky
  • 6,754
  • 1
  • 31
  • 39