Questions tagged [fault-tolerance]

Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).

276 questions
11
votes
1 answer

fault tolerance in MPICH/OpenMPI

I have two questions- Q1. Is there a more efficient way to handle the error situation in MPI, other than check-point/rollback? I see that if a node "dies", the program halts abruptly.. Is there any way to go ahead with the execution after a node…
Param
  • 197
  • 1
  • 7
10
votes
3 answers

Will database file of SQLite3 be damaged when suddenly power-off or OS crash?

I open the database file and obtain a database connection using open() method of sqlite3 and the connection will not be closed until program exits. If there occurs an unexpected error such as computer's suddenly power-off or OS crash, will the mode…
quantity
  • 3,771
  • 3
  • 21
  • 20
10
votes
1 answer

How is running out of memory handled in Erlang?

With the "let it crash" philosophy of Erlang, one would expect the entire VM not to crash if a process cannot allocate the memory needed to proceed with its operations; indeed, if the system had a heuristic to kill some process to free some memory,…
9
votes
1 answer

What's up with the [OptionalField] Attribute?

As I understand it I have to adorn a new member in a newer version of my class with the [OptionalField] Attribute when I deserialize an older version of my class that lacks this newer member. However, the code below throws no exception while the…
Dabblernl
  • 14,939
  • 16
  • 90
  • 141
9
votes
1 answer

Hystrix Execution Patterns

I'm trying to wrap my head around Hystrix and after reading their docs, still have a question about its usage patterns. For one, I don't understand the use case for when to use their Asynchronous execution vs. their Reactive execution. The only…
smeeb
  • 22,487
  • 41
  • 197
  • 389
8
votes
2 answers

How to discover that a Scala remote actor is died?

In Scala, an actor can be notified when another (remote) actor terminates by setting the trapExit flag and invoking the link() method with the second actor as parameter. In this case when the remote actor ends its job by calling exit() the first one…
Mario Fusco
  • 12,410
  • 3
  • 24
  • 36
8
votes
3 answers

Hystrix Request Caching by Example

I am trying to figure out how Hystrix request caching works but am not following the wiki or end-to-end examples they provide in their docs. Essentially I have the following HystrixCommand subclass: public class GetFizzCommand extends…
IAmYourFaja
  • 50,141
  • 159
  • 435
  • 728
8
votes
2 answers

Handling Faults in Akka actors

I've a very simple example where I've an Actor (SimpleActor) that perform a periodic task by sending a message to itself. The message is scheduled in the constructor for the actor. In the normal case (i.e., without faults) everything works fine.…
Soumya Simanta
  • 10,777
  • 23
  • 95
  • 153
8
votes
4 answers

What to do if the leader fails in Multi-Paxos for master-slave systems?

Backgound: In section 3, named Implementing a State Machine, of Lamport's paper Paxos Made Simple, Multi-Paxos is described. Multi-Paxos is used in Google Paxos Made Live. (Multi-Paxos is used in Apache ZooKeeper). In Multi-Paxos, gaps can…
hengxin
  • 1,633
  • 2
  • 16
  • 38
8
votes
4 answers

How do supervisor processes monitor processes? Can the same be done on the JVM?

Erlang fault tolerance (as I understand it) includes the use of supervisor processes to keep an eye on worker processes, so if a worker dies the supervisor can start up a new one. How does Erlang do this monitoring, especially in a distributed…
Alan Kent
  • 897
  • 1
  • 8
  • 12
7
votes
2 answers

Building a fault-tolerant soft real-time web application with Erlang/OTP

I would like to build a fault-tolerant soft real-time web application for a pizza delivery shop. It should help the pizza shop to accept phone calls from customers, put them as orders into the system (via a CRM web client) and help the dispatchers…
skanatek
  • 4,793
  • 3
  • 41
  • 69
6
votes
3 answers

Fault (radiation) tolerant soft core?

Is there a certification or some authority that decides if a soft core is fault tolerant or not? Another question. I've seen that LEON3-FT is radiation tolerant only when implemented on the RTAX Actel FPGA. Is that right? Excuse me but I'm confused…
6
votes
8 answers

Is it not possible to make a C++ application "Crash Proof"?

Let's say we have an SDK in C++ that accepts some binary data (like a picture) and does something. Is it not possible to make this SDK "crash-proof"? By crash I primarily mean forceful termination by the OS upon memory access violation, due to…
Enno Shioji
  • 25,422
  • 13
  • 67
  • 104
6
votes
1 answer

How to deploy zookeeper across multiple data centers and failover?

I would like to know about the existing approaches that are available when running Zookeeper across data centers? One approach that I found after doing some research is to have observers. That approach is to have only one ensemble in the main data…
6
votes
1 answer

Microservices styles and tradeoffs - Akka cluster vs Kubernetes vs

So, here's the thing. I really like the idea of microservices and want to set it up and test it before deciding if I want to use it in production. And then if I do want to use it I want to slowly chip away pieces of my old rails app and move logic…
Matjaz Muhic
  • 4,310
  • 2
  • 13
  • 32
1
2
3
18 19