INSERT with many subselects - performance and error selectivity

Question

I have a plpgsql function in which I want to add one row into a Data table where many columns are updated from subselects from tables TableA, TableB and Session:

CREATE TABLE TableA (
    a_id    SERIAL PRIMARY KEY,
    a_name  TEXT UNIQUE NOT NULL
);
CREATE TABLE TableB (
    b_id    SERIAL PRIMARY KEY,
    b_name  TEXT UNIQUE NOT NULL
);
CREATE TABLE Session (
    session_id SERIAL PRIMARY KEY
);
CREATE TABLE Data (
    session_id  INTEGER REFERENCES Session(session_id) NOT NULL,
    a_id        INTEGER REFERENCES TableA(a_id) NULL,
    b_id        INTEGER REFERENCES TableB(b_id) NULL
);

This is quite easy, but the function must be as fast as possible and also I need specific error messages to discriminate between subselect fails. Specifically:

invalid (or NULL) session id
invalid a name (if it was not NULL)
invalid b name (if it was not NULL)

First I tried most straightforward approach - just selected all values I needed, checked it for errors and then inserted the values:

CREATE OR REPLACE FUNCTION store_data(ssid INTEGER, a TEXT, b TEXT) RETURNS VOID
AS $$
DECLARE
    _a_id INTEGER = NULL;
    _b_id INTEGER = NULL;
BEGIN
    PERFORM 1 FROM Session WHERE session_id = ssid;
    IF NOT FOUND THEN
        RAISE EXCEPTION 'INVALID SESSION: %', ssid;
    END IF;
    IF a_name IS NOT NULL THEN
        SELECT INTO _a_id a_id
            FROM TableA WHERE a_name = a;
        IF NOT FOUND THEN
            RAISE EXCEPTION 'INVALID A NAME: %', a;
        END IF;
    END IF;
    IF b_name IS NOT NULL THEN
        SELECT INTO _b_id b_id
            FROM TableA WHERE b_name = b;
        IF NOT FOUND THEN
            RAISE EXCEPTION 'INVALID B NAME: %', b;
        END IF;
    END IF;
    INSERT INTO Data (session_id, a_id, b_id) VALUES (ssid, _a_id, _b_id);
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

This works just fine, but it's not very fast. I need to make it faster, so my other approach was with subselects:

...
BEGIN
    INSERT INTO Data (session_id, a_id, b_id)
        VALUES (
            (SELECT session_id FROM Session WHERE session_id = ssid),
            CASE WHEN a IS NULL THEN
                NULL
            ELSE
                (SELECT a_id FROM TableA WHERE a_name = a)
            END,
            CASE WHEN b IS NULL THEN
                NULL
            ELSE
                (SELECT b_id FROM TableB WHERE b_name = b)
            END
        );
    -- but no error handling :(
END;
...

This is a bit faster, but I can't figure out how to figure out which subselect failed and what error to report.

My question: is there a way to make it faster while keeping specific error messages?

The solution must work with postgres 8.4.

What's the expected (rough) ratio of success / fail for your function? Also, as *always*, add your version of Postgres, please. — Erwin Brandstetter, Jul 23 '15 at 17:02
@ErwinBrandstetter I haven't measured it, but it should be something like 99% success, 1% fail. Version is 8.4 (in some instances it runs on 9.1, but it has to run on 8.4) — Jan Spurny, Jul 23 '15 at 22:48
Version numbers are *always* important to questions here - even more so when using outdated software ([pg 8.4 has reached EOL 1 year ago](http://www.postgresql.org/support/versioning/)). My answer should work for 8.4 as well, but I did not test. There were two typos, now fixed. — Erwin Brandstetter, Jul 24 '15 at 00:33

score 1 · Accepted Answer · edited May 23 '17 at 11:51

Assuming current Postgres 9.4.
Use the RETURNING clause of the INSERT statement to check after the INSERT:

CREATE OR REPLACE FUNCTION store_data(ssid int, a text, b text)
  RETURNS void AS
$func$
DECLARE
   _rec record;
BEGIN
   INSERT INTO data (session_id, a_id, b_id)
   VALUES ((SELECT t.session_id FROM session t WHERE t.session_id = $1)
         , (SELECT t.a_id       FROM tablea  t WHERE t.a_name = $2)
         , (SELECT t.b_id       FROM tableb  t WHERE t.b_name = $3))   -- tableb!
   RETURNING *
   INTO _rec;

   IF _rec.session_id IS NULL THEN  -- cannot be NULL
      RAISE EXCEPTION 'INVALID SESSION: %', ssid;
   ELSIF _rec.a_id IS NULL AND a IS NOT NULL THEN  -- allow NULL input
      RAISE EXCEPTION 'INVALID A NAME: %', a;
   ELSIF _rec.b_id IS NULL AND b IS NOT NULL THEN
      RAISE EXCEPTION 'INVALID B NAME: %', b;
   END IF;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER
                        SET search_path = public, pg_temp; -- adapt

Each subselect results in a NULL value if the row in the look-up table cannot be found. Hence, exactly one row is always inserted (and returned).

Be wary of naming conflicts between parameters, variables and column names that have not been table-qualified.

You should probably provide a search_path when using SECURITY DEFINER. Details:

How does the search_path influence identifier resolution and the "current schema"

If you had NOT NULL constraints on the columns a_id and b_id in table data as well, then all you would need is:

   INSERT INTO data (session_id, a_id, b_id)
   VALUES ((SELECT ssid FROM session t WHERE t.session_id = $1)
         , (SELECT t.a_id FROM tablea  t WHERE t.a_name = $2)
         , (SELECT t.b_id FROM tableb  t WHERE t.b_name = $3));

If one of the values results in being NULL, you get an error message telling you which NOT NULL constraint has been violated.

You may or may not want to insert missing values in the look-up tables:

Is SELECT or INSERT in a function prone to race conditions?

Thanks, but this won't work, because only `session_id` is `NOT NULL`. `a_id` and `b_id` can be `NULL`. I should've mentioned that, my bad. I'll edit the question (it was lost in anonymization). As for naming conflicts - that's why I use short non-conflicting parameter names instead of `$1` notation, while every column has unique longer name, so I don't have name all tables in selects and I can use `JOIN USING ()` syntax. As for a search path, this was anonymized - in real code, I use fully qualified names like `namespaceX.funcY` for "my" db objects. — Jan Spurny, Jul 23 '15 at 23:10
@JanSpurny: My first (main) version of the function does not assume `NOT NULL` constraints and should work *as is*. You could omit the check for `_rec.session_id` in the function as this is covered by a `NOT NULL` constraint, but it won't cost much to have it anyway. `SECURITY DEFINER`: Yes, this is good practice. I would still make it a habit to follow recommendations for [Writing `SECURITY DEFINER` Functions Safely](http://www.postgresql.org/docs/current/interactive/sql-createfunction.html#SQL-CREATEFUNCTION-SECURITY). — Erwin Brandstetter, Jul 24 '15 at 00:10
yes, it will work, but I'd have to add `AND a IS NOT NONE` to `IF _a_id IS NULL` to make sure I handle `NULL` in `a` and `b` parameters correctly. Thanks. — Jan Spurny, Jul 24 '15 at 11:28

INSERT with many subselects - performance and error selectivity

1 Answers1