AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

Question

Hope you're all okay.

We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit.

We are looking for some answers not found in this forum plus any guidance based on your experience.

Questions:

Does recreation of the cluster with bigger EC2 instances, would yield a higher limit value?

Does adding new nodes to the existing cluster would yield a higher limit value?

From the app development perspective: What specific strategies/actions you'd recommend in order to spot or predict a situation whereby this limit will be hit?

Txs - Jimmy

Do you actually have 500+ concurrent users of your Amazon Redshift cluster? Are these real people, or are they apps that are connecting? Do you really need that many connections? What are they all doing? — John Rotenstein, Apr 18 '20 at 00:20

score 1 · Answer 1 · answered Apr 20 '20 at 05:39

As you said this is a hard limit in Redshift and there is no way to up it. Redshift is not a high concurrency / high connection database.

I expect that if you need the large data analytic horsepower of Redshift you can get around this with connection sharing. Pgpool is a common tool for this.

score 1 · Answer 2 · answered Apr 20 '20 at 19:03

Okay folks.
thanks to all who answered.
I posted a support ticket in AWS and this is the recommendation, pasting all here, it's long but I hope it works for many people running into this issue. The idea is to catch the situation before it happens:

To monitor the number of connections made to the database, you can create a cloudwatch alarm based on the Database connections metrics that will trigger a lambda function when a certain threshold is reached. This lambda function can then terminate idle connections by calling a procedure that terminates idle connections.

Please find the query that creates a procedure to log and terminate long running inactive sessions
:

1. Add view to get all current inactive sessions in the cluster

CREATE OR REPLACE VIEW inactive_sessions as (
    select a.process, 
    trim(a.user_name) as user_name,
    trim(c.remotehost) as remotehost,
    a.usesysid, 
    a.starttime, 
    datediff(s,a.starttime,sysdate) as session_dur, 
    b.last_end, 
    datediff(s,case when b.last_end is not null then b.last_end else a.starttime end,sysdate) idle_dur
        FROM
        (
            select starttime,process,u.usesysid,user_name 
            from stv_sessions s, pg_user u 
            where 
            s.user_name = u.usename 
            and u.usesysid>1
            and process NOT IN (select pid from stv_inflight where userid>1 
            union select pid from stv_recents where status != 'Done' and userid>1)
        ) a 
        LEFT OUTER JOIN (
            select 
            userid,pid,max(endtime) as last_end from svl_statementtext 
            where userid>1 and sequence=0 group by 1,2) b ON a.usesysid = b.userid AND a.process = b.pid

        LEFT OUTER JOIN (
            select username, pid, remotehost from stl_connection_log
            where event = 'initiating session' and username <> 'rsdb') c on a.user_name = c.username AND a.process = c.pid
        WHERE (b.last_end > a.starttime OR b.last_end is null)
        ORDER BY idle_dur
);

2. Add table for logging information about long running transactions that was terminated 

CREATE TABLE IF NOT EXISTS terminated_inactive_sessions (
    process int,
    user_name varchar(50),
    remotehost varchar(50),
    starttime timestamp,
    session_dur int,
    idle_dur int,
    terminated_on timestamp DEFAULT GETDATE()   
);

3. Add procedure to log and terminate any inactive transactions running for longer than 'n' amount of seconds

CREATE OR REPLACE PROCEDURE terminate_and_log_inactive_sessions (n INTEGER) 
AS $$ 
DECLARE
  expired RECORD ; 
BEGIN
FOR expired IN SELECT process, user_name, remotehost, starttime, session_dur, idle_dur FROM inactive_sessions where idle_dur >= n
LOOP
EXECUTE 'INSERT INTO terminated_inactive_sessions (process, user_name, remotehost, starttime, session_dur, idle_dur) values (' || expired.process || ' , ''' || expired.user_name || ''' , ''' || expired.remotehost || ''' , ''' || expired.starttime || ''' , ' || expired.session_dur || ' , ' || expired.idle_dur || ');';
EXECUTE 'SELECT PG_TERMINATE_BACKEND(' || expired.process || ')';
END LOOP ; 

END ; 
$$ LANGUAGE plpgsql;

4. Execute the procedure by running the following command:

  call terminate_and_log_inactive_sessions(100);


Here is a sample lambda function that attempts to close idle connections by querying the view 'inactive_sessions' created above, which you can use as a reference. 

#Current time
now = datetime.datetime.now()

query = "SELECT process, user_name, session_dur, idle_dur FROM inactive_sessions where idle_dur >= %d"

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):

   try:
       conn = psycopg2.connect("dbname=" + db_database + " user=" + db_user + " password=" + db_password + " port=" + db_port + " host=" + db_host)
       conn.autocommit = True
   except:
       logger.error("ERROR: Unexpected error: Could not connect to Redshift cluster.")   
       sys.exit()

   logger.info("SUCCESS: Connection to RDS Redshift cluster succeeded")

   with conn.cursor() as cur:
       cur.execute(query % (session_idle_limit))
       row_count = cur.rowcount
       if row_count >=1:
           result = cur.fetchall()
           for row in result:
               print("terminating session with pid %s that has been idle for %d seconds at %s" % (row[0],row[3],now))
              cur.execute("SELECT PG_TERMINATE_BACKEND(%s);" % (row[0]))
           conn.close()
       else:
           conn.close()

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

2 Answers2