Fault tolerance refers to a system's capability to isolate, compensate for and recover from failure with minimal impact to the end user. When using this tag - include tags indicating the system and/or technology you are working with (as additional support meta-data).
Questions tagged [fault-tolerance]
276 questions
6
votes
2 answers
Apache Storm: Track tuples by unique ID from Source Spout to Final Bolt
I want a method of uniquely identifying tuples throughout a whole Storm topology, so that each tuple can be tracked from Spout to the final Bolt.
The way I understand it is when passing a unique message id with an emit from a spout for example:…
perkss
- 872
- 10
- 30
6
votes
1 answer
How is the detection of terminated nodes in Erlang working? How is net_ticktime influencing the control of node liveness in Erlang?
I set net_ticktime value to 600 seconds.
net_kernel:set_net_ticktime(600)
In Erlang documentation for net_ticktime = TickTime:
Specifies the net_kernel tick time. TickTime is given in seconds. Once every TickTime/4 second, all connected nodes are…
Zuzana
- 132
- 6
6
votes
1 answer
Service Stack Redis reconnect after Redis server reboot
We are using Service Stack's RedisClient's BlockingDequeue to persist some data until it can be processed. The calling code looks like
using (var client = ClientPool.GetClient())
return…
swestner
- 1,766
- 14
- 17
6
votes
1 answer
Store and forward HTTP requests with retries?
Twilio and other HTTP-driven web services have the concept of a fallback URL, where the web services sends a GET or POST to a URL of your choice if the main URL times out or otherwise fails. In the case of Twilio, they will not retry the request if…
Matt J
- 39,051
- 7
- 44
- 56
6
votes
2 answers
More graceful error handling in C++ library - jsoncpp
I'm not sure if this will be a specific thing with jsoncpp or a general paradigm with how to make a C++ library behave better. Basically I'm getting this trace:
imagegeneratormanager.tsk: src/lib_json/json_value.cpp:1176: const Json::Value& …
djechlin
- 54,898
- 29
- 144
- 264
5
votes
3 answers
Disable tolerance (or enable strictness) in Firefox when rendering HTML
Firefox has a certain tolerance when rendering bad HTML. This means even if a closing tag is left out, the HTML will be displayed as if everything was fine. This tolerance aspect is particularly relevant when one is using JavaScript to manipulate or…
unode
- 8,181
- 4
- 30
- 44
5
votes
1 answer
Handle Akka actor bounded mailbox MessageQueueAppendFailedException
To avoid OOM, I'm bounding the mailbox size of some of my Akka 1.1.3 actors with a shared custom dispatcher. For example:
object Static {
val dispatcher = Dispatchers.newExecutorBasedEventDrivenWorkStealingDispatcher(
…
Bluu
- 4,327
- 4
- 27
- 33
5
votes
5 answers
Fail fast finally clause in Java
Is there a way to detect, from within the finally clause, that an exception is in the process of being thrown?
See the example below:
try {
// code that may or may not throw an exception
} finally {
SomeCleanupFunctionThatThrows();
//…
Greg Rogers
- 33,366
- 15
- 63
- 93
5
votes
2 answers
Misunderstanding of spark RDD fault tolerant
Many say:
Spark does not replicate data in hdfs.
Spark arranges the operations in DAG graph.Spark builds RDD lineage. If a RDD is lost they can be rebuilt with the help of lineage graph.
So there is no need of data replication as the RDDS can be…
Gary Gauh
- 3,847
- 3
- 27
- 38
5
votes
3 answers
How do I isolate untrusted native code in Java?
I have a piece of C library that I don't trust (in the sense that it might crash frequently). I am calling this from a Java process.
To prevent the crash in C library bringing the whole Java app. down, I figured it will be best if I spawn a…
Enno Shioji
- 25,422
- 13
- 67
- 104
5
votes
1 answer
How to robustly, but minimally, distribute items across a peer-to-peer system
If one has a peer-to-peer system that can be queried, one would like to
reduce the total number of queries across the network (by distributing "popular" items widely and "similar" items together)
avoid excess storage at each node
assure good…
John with waffle
- 4,043
- 18
- 32
5
votes
2 answers
Error monitoring/handling on webservers
We have a web server that we're about to launch a number of applications onto. They will all share database and memcached servers, but each application has it's own mySQL database and all memcached keys per application, is prefixed.
Possible…
Industrial
- 36,181
- 63
- 182
- 286
5
votes
4 answers
Testing fault tolerant code
I’m currently working on a server application were we have agreed to try and maintain a certain level of service. The level of service we want to guaranty is: if a request is accepted by the server and the server sends on an acknowledgement to the…
Robert
- 6,198
- 2
- 31
- 40
5
votes
2 answers
Articles about replication schemes/algorithms?
I'm designing a distributed system with a certain flow of data in it. I'd like to guarantee that at least N nodes have almost-current data at any given time.
I do not need complete consistency, only eventual consistency (t.i. for any time instant,…
jkff
- 16,670
- 3
- 46
- 79
5
votes
3 answers
How to configure fault tolerance programmatically for a spring tasklet (not a chunk)
Programmatically configuring fault tolerance for a chunk works kind of as follows:
stepBuilders.get("step")
.chunk(1)
.reader(reader())
.processor(processor())
.writer(writer())
.listener(logProcessListener())
…
Horowitzathome
- 241
- 3
- 6