45

I have the following issues in our production environment (Web-Farm - 4 nodes, on top of it Load balancer):

1) Timeout performing HGET key, inst: 3, queue: 29, qu=0, qs=29, qc=0, wr=0/0 at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor``1 processor, ServerEndPoint server) in ConnectionMultiplexer.cs:line 1699 This happens 3-10 times in a minute

2) No connection is available to service this operation: HGET key at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor``1 processor, ServerEndPoint server) in ConnectionMultiplexer.cs:line 1666

I tried to implement as Marc suggested (Maybe I interpreted it incorrectly) - better to have fewer connections to Redis than multiple. I made the following implementation:

public class SeRedisConnection
{
    private static ConnectionMultiplexer _redis;

    private static readonly object SyncLock = new object();

    public static IDatabase GetDatabase()
    {
        if (_redis == null || !_redis.IsConnected || !_redis.GetDatabase().IsConnected(default(RedisKey)))
        {
            lock (SyncLock)
            {
                try
                {
                    var configurationOptions = new ConfigurationOptions
                    {
                        AbortOnConnectFail = false
                    };
                    configurationOptions.EndPoints.Add(new DnsEndPoint(ConfigurationHelper.CacheServerHost,
                        ConfigurationHelper.CacheServerHostPort));

                    _redis = ConnectionMultiplexer.Connect(configurationOptions);
                }
                catch (Exception ex)
                {
                   IoC.Container.Resolve<IErrorLog>().Error(ex);
                    return null;
                }
            }
        }
        return _redis.GetDatabase();
    }

    public static void Dispose()
    {
        _redis.Dispose();
    }
}

Actually dispose is not being used right now. Also I have some specifics of the implementation which could cause such behavior (I'm only using hashes): 1. Add, Remove hashes - async 2. Get -sync

Could somebody help me how to avoid this behavior?

Thanks a lot in advance!

SOLVED - Increasing Client connection timeout after evaluating network capabilities.

UPDATE 2: Actually it didn't solve the problem. When cache volume starting to get increased e.g. from 2GB. Then I saw the same pattern actually these timeouts were happend about every 5 minutes. And our sites were frozen for some period of time every 5 minutes until fork operation was finished. Then I found out that there is an option to make a fork (save to disk) every x seconds:

save 900 1
save 300 10
save 60 10000

In my case it was "save 300 10" - save in every 5 minutes if at least 10 updates were happened. Also I found out that "fork" could be very expensive. Commented "save" section resolved the problem at all. We can commented "save" section as we are using only Redis as "cache in memory" - we don't need any persistance. Here is configuration of our cache servers "Redis 2.4.6" windows port: https://github.com/rgl/redis/downloads

Maybe it has been solved in recent versions of Redis windows port in MSOpentech: http://msopentech.com/blog/2013/04/22/redis-on-windows-stable-and-reliable/ but I haven't tested yet.

Anyway StackExchange.Redis has nothing to do with this issue and it works pretty stable in our production environment, thanks to Marc Gravell.

FINAL UPDATE: Redis is single-threaded solution - it is ultimately fast but when it comes to the point of releasing the memory (Removing items that are stale or expired) the problems are emerged due to one thread should reclaim the memory (that is not fast operation - whatever algorithm is used) and the same thread should handle GET, SET operations. Of course it happens when we are talking about medium-loaded production environment. Even if you use a cluster with slaves when the memory barrier is reached it will have the same behavior.

Ross Presser
  • 5,145
  • 1
  • 22
  • 55
George Anisimov
  • 817
  • 1
  • 7
  • 16
  • Very helpful, thank you! I actually modified the time between saves to make sure there isn't ever a huge queue of items to save to disk which would cause enough lag to timeout a query. – tommed Feb 12 '15 at 13:33
  • 3
    Hey tommed, well unfortunatelly we stopped using Redis at all as it proved its single threaded architecture with timeouts. Example: We had 32GB / 4 nodes cache servers (clustering). When max threshold of memory has been reached Redis tries release the memory and then timeouts happen. I admit that we are using heavily Redis in our production it had worked perfectly until threshold of memory has been reached.So we chose another multithreaded caching solution. But as I said maybe for your volume and config it would work out but you should make some load tests when threshold of memory is reached. – George Anisimov Feb 16 '15 at 09:49
  • 3
    Thanks, we have to process 10 million records per day which require several GETs to Redis per record. Most of which are received within a peak hour. We have accepted that there will be times when Redis sync to disk and lock the only thread and just wait patiently until this is completed. After tweaking the I/O sync times to limit this issue (**at the expense of elevating the risk of loosing records) we have found a good balance where Redis works well. Seems a shame that there isn't a separate thread for I/O syncing!! – tommed Mar 09 '15 at 17:15
  • 2
    If you are using Asp.Net please try to refer my answer in http://stackoverflow.com/questions/25416562/stackexchange-redis-with-azure-redis-is-unusably-slow-or-throws-timeout-errors this fixed queue timeout errors for me, i.e. your issue 1 and 2. I tried other ways (singletons and locks) to maintain less connections but with no luck !, hope this helps – Sharat Pandavula Sep 20 '14 at 08:33
  • 2
    @GeorgeAnisimov So we chose another multi threaded caching solution can you provide us with this solution – mohammed sameeh Dec 06 '17 at 23:17
  • @mohammedsameeh - sure it was microsoft.appfabric - but it was long time ago - it might be that Redis has improved :) – George Anisimov Apr 11 '18 at 01:25

1 Answers1

1

It looks like in most cases this exception is a client issue. Previous versions of StackExchange.Redis used Win32 socket directly which sometimes has a negative impact. Probably Asp.net internal routing somehow related to it.
The good news is that StackExchange.Redis's network infra was completely rewritten recently. The last version is 2.0.513. Try it and there is a good chance that your problem will go.

Eric Aya
  • 68,765
  • 33
  • 165
  • 232
  • I am unfortunately still experiencing this issue using the latest version of StackExchange.Redis (v2.0.519) as included through the latest version (v4.0.1) of Microsoft.Web.RedisSessionStateProvider. As someone who was incredibly excited for this update, I'm so far finding it has not resolved my issues. I've made adjustments to our ThreadPool settings to ensure we aren't being throttled back by low minimums, evaluated our cache metrics in Azure, still seeing a lot of timeouts on the EVAL function. – Kaitebug Dec 17 '18 at 20:03
  • @Kaitebug I know this is almost a year old, but did you ever find a solution for it? We are using the latest version StackExchange.Redis (2.0.601.3402) and have tried the same things you list here, but still experiencing these issues constantly since upgrading. – blizz Oct 15 '19 at 21:20