0

I want to test the throughput of my system which has a connection to a postgresql database. My system consists of 2 main components: a ThreadPoolExecutor as newFixedThreadPool with a maximum of 10 threads and a PGPoolingDataSource called connectionPool which has maximum of 10 connections to the database. I call stored procedures in the postgres database, the stored procedure does a simple insert and returns an error message if the insert failed. Executing a single call of this stored procedure takes about 20-30 ms.

The system works like this: The main thread creates message tasks and passes them to the thread pool. A message task does the following: it gets a connection from the connection pool and calls a stored procedure on the postgres server. It waits for the response and then the task is finished. The thread in the thread pool can now work on a new message task.

Now, I think this should work fine and it does to some extent. It is only very slow and I have absolutely no idea why. Using the following code I record about 300-500 inserts a second when it should be 6000 inserts per second. I have no idea why. When using the systems monitor, I see that all cpus are at about 20% load. When I uncomment the section indicated with (1), 1 cpu is at 100% load while the others are around 0%, which is a mystery to me.

If anyone can share some light on what I'm doing wrong, that would be great. Is it that my postgres server is not configured correctly? When I use the top command, it shows that java uses about 20% cpu and there are 8 postgres processes which each use about 3%. (I'm on Ubuntu 14.04 using Eclipse).

Here is my MainTester code, containing the main function. It creates the thread pool and the database connection pool.

public class MainTester {
public static ThreadPoolExecutor threadPoolExecutor;
    public static PGPoolingDataSource connectionPool;

public static void main(String[] args) {

    establishConnectionPool(10);
    threadPoolExecutor = (ThreadPoolExecutor) 
    Executors.newFixedThreadPool(10);

    Operator operator = new Operator(1, 2, 30);
        operator.run();
// i created an other thread here before.
//Now I just use the main thread to run the operator
}


private static void establishConnectionPool(int nrOfConnections)
    {
        connectionPool = new PGPoolingDataSource();
        connectionPool.setDataSourceName("ConnectionPool");
        connectionPool.setServerName(dbServerName);
        connectionPool.setDatabaseName(dbName);
        connectionPool.setUser(dbUser);
        connectionPool.setPassword(dbPassword);
        connectionPool.setMaxConnections(nrOfConnections);
    }

This is my Operator code. It spawns message tasks and hands them over to the thread pool. I want to let it run for 2 minutes and then check the amount of messages it has inserted. I want to keep the queue of the thread pool full at all times, this is why I check if the queue of the thread pool has less than a 1000 tasks. If it has less, I spawn new tasks for the thread pool to chew on.

public class Operator implements Runnable{

private int minutesToRun = 2;

private void run () {

    long startTime = System.currentTimeMillis();

    while (System.currentTimeMillis() - startTime < minutesToRun * 60 * 1000 + 10) {

            while(MainTester.threadPoolExecutor.getQueue().size() < 1000) {
                MessageTask messageTask = new MessageTask(QueueOperation.SEND, 1, 1, 1, "abc");
                MainTester.threadPoolExecutor.execute(messageTask);
            }

            try { // (1)
                Thread.sleep(100);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
    }
}

}

(1) when I don't sleep here, the system monitor shows that 1 cpu is at 100%, the others at 0%. This makes no sense to me. Sure, this method would fully occupy one cpu but the threads from the thread pool should work on the other cpus no?

Here is my code for the message task:

public class MessageTask implements Runnable {

private QueueOperation operation;
private int senderId;
private int receiverId;
private int queueId;
private String message; 


public MessageTask (QueueOperation op, int senderId, int receiverId, int queueId, String message)
{
    operation = op;
    this.senderId = senderId;
    this.receiverId = receiverId;
    this.queueId = queueId;
    this.message = message;
}

@Override
public void run() {

    Connection connection = null;
    try {
        connection = MainTester.connectionPool.getConnection();
    } catch (SQLException e) {
        e.printStackTrace();
    }

    try{

        Statement statement = connection.createStatement();

        String dbStoredProcedure = "SELECT send(" + senderId + "," + receiverId + "," + queueId + "," + "'"+message+"'"+ ");";;

        ResultSet resultSet = statement.executeQuery(dbStoredProcedure);
        resultSet.next();
        String dbResponse = resultSet.getString(1);
    }

    catch (SQLException e) {
    }

    finally {
        try {
            connection.close();
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

}

So my questions is: why it is so slow? Why are all 8 of my cpus only at 20% capacity? Perhaps I've configured my postgresql server wrong? I've not changed anything in the default configuration. Have I misunderstood how the thread pool works? Or does the connection pool not work as I intended?

  • Why are you using a stored procedure for doing a simple `INSERT`? There is always inherent slowness when you invoke SPs from a `SELECT` due to context switching. Also, not using bind variables further slows down the parsing that Postgres has to do before executing the statement. – Mick Mnemonic Nov 01 '15 at 12:38
  • 2
    Essential questions for parallel insert speed: 1. when does COMMITs occur? At each insert? 2. is there a primary key in the table? #1 is good to avoid locking but bad for performance. #2 is the opposite. – Daniel Vérité Nov 01 '15 at 12:52
  • And what is disk load? – Alexei Kaigorodov Nov 01 '15 at 13:07
  • 2
    So it takes around 20ms for a sp call, which means you can get around 50 calls per second per thread in the best case. You're using 10 threads... Where do you get your 6000 call number from? – Voo Nov 01 '15 at 13:36
  • You must close the resultsets, close the statement and then close the connection. Also why one connection does only one insert? This is insane. One connection (per thread) may be used for multiple inserts. – Alexandros Nov 01 '15 at 14:41
  • @Alexandros the connection comes from a pool. When it's closed, the physical connection is not actually closed: it simply goes back to the pool. – JB Nizet Nov 01 '15 at 15:13
  • @JB Nizet. And the reason for closing the connection is? If you have a maximum of 10 threads, then exactly 10 connections are enough. Why have a pool for hundred connections, instead of pool with 10 connections? – Alexandros Nov 01 '15 at 15:21
  • The reason for closing the connection is to return it to the pool. If you don't close it, every thread is blocked after 10 inserts. The pool does have 10 connections. Have you read the code? – JB Nizet Nov 01 '15 at 15:39

1 Answers1

0

When you measure the stored proc execution time, you're probably not factoring in how long a commit takes. You also appear to be focusing on CPU and completely ignoring disk I/O and the cost of disk flushes.

300 transactions per second is a pretty reasonable number for a typical system with a basic SSD. So I'd say you're committing after each insert.

To get faster results you'll need to:

  • batch work into transactions that do multiple inserts;
  • enable a commit_delay and set synchronous_commit = off (has some data loss risk); or
  • get a faster disk

For more information see How to speed up insertion performance in PostgreSQL

If you don't mind losing your entire dataset after a crash you could also use unlogged tables. But, really, after an unclean shutdown your data will be erased. Gone. Unrecoverable. So be sure you mean it.

Community
  • 1
  • 1
Craig Ringer
  • 259,831
  • 56
  • 584
  • 684